Run in Host Network
Network bandwidth is a bottleneck resource for communication-intensive jobs. Host mode networking can be useful to optimize performance. In addition, other scenarios (e.g: nvlink communications between containerized gpu processes) may depend on host network as well.
How To Use
KubeDL provides a feature-gate to enable
hostnetwork mode for jobs. Users only need to add an annotation
kubedl.io/network-mode: host to the job specifications, for example:
+ kubedl.io/network-mode: 'host'
The essence of
hostnetwork-mode is to randomize container ports to avoid port collision and enable service discovery
across workers. KubeDL achieves by following steps:
Podspec and set DNS policy as
- Choose a random port as container port.
TargetPortof corresponding worker's
Serviceto the previous randomized port, and set
CluterIPas empty string(instead of
None), so that kube-proxy will be able to forward traffic from
- Change the job cluster spec (e.g. the
- Handle worker fail-over and use latest available port as the
TargetPortin the new worker.
Here is a Tensorflow job example: