KubeDL Metrics
| Metric Names | label | Description |
|---|---|---|
| kubedl_jobs_created | kind | Counts number of jobs created |
| kubedl_jobs_deleted | kind | Counts number of jobs deleted |
| kubedl_jobs_successful | kind | Counts number of jobs successfully finished |
| kubedl_jobs_failed | kind | Counts number of jobs failed |
| kubedl_jobs_restarted | kind | Counts number of jobs restarted |
| kubedl_jobs_running | kind | Counts number of jobs currently running |
| kubedl_jobs_pending | kind | Counts number of jobs currently pending |
| kubedl_jobs_first_pod_launch_delay_seconds | kind, name, namespace, uid | Histogram for recording launch delay duration (from job created to first pod running) |
| kubedl_jobs_all_pods_launch_delay_seconds | kind, name, namespace, uid | Histogram for recording launch delay duration (from job created to all pods running) |
label specifics the labels supported for the corresponding prometheus metrics
kind- the target job kind, e.g. TFJob, PyTorchJob, MarsJob, XGBoostJobname- the name of the jobnamespace- the namespace of the jobuid- the uid of the job