Create a ModelVersion from training job
KubeDL training already supports generating a ModelVersion when the job completes. Thus, a model image is automatically generated after job succeeds.
To enable this feature, the TensorFlow job spec needs to set the
modelVersion field like the example below:
apiVersion: "training.kubedl.io/v1alpha1" kind: "TFJob" metadata: name: "distributed-tfjob" spec: cleanPodPolicy: None # modelVersion defines the location where the model is stored. modelVersion: # The model name for the model version modelName: mymodel # The dockerhub repo to push the generated image imageRepo: jianhe6/mymodel storage: # Use hostpath, NFS is also supported. localStorage: # The host path for storing the generated model. # Each job should have its own unique parent folder, in this case, 'mymodel', so that multiple ModelVersions are not collided into the same folder. path: /models/mymodel # The mounted path inside the container. # The training code is expected to export the model under this path, such as storing the tensorflow saved_model. mountPath: /kubedl-model # The node where the chief worker run to store the model nodeName: kind-control-plane tfReplicaSpecs: Worker: replicas: 3 restartPolicy: Never template: spec: containers: - name: tensorflow image: kubedl/tf-mnist-estimator-api:v0.1 imagePullPolicy: Always command: - "python" - "/keras_model_to_estimator.py" - "/tmp/tfkeras_example/" # model checkpoint dir - "/kubedl-model" # export dir for the saved_model format
This example uses hostpath volume. The training code will first generate the model artifacts under
/kubedl-model inside the training
container. Correspondingly, the model will be preset on the local host path
Then KubeDL creates a ModelVersion CR that triggers a Kaniko container to generate an image that contains the model artifacts at
/kubedl-model, and also push that to docker hub at
Note that the training code needs to specify
/kubedl_model as the model export path (the same as the mountPath in storage).
KubeDL also creates a Model object with the ModelVersion’s ownerReference pointing the Model.
Create a ModelVersion Manually
A ModelVersion can also be created manually pointing to an existing external storage.
ModelVersion CR will generate a model image at
modelhub/model1 in dockerhub, including the model artifacts from
node1’s local path at
apiVersion: model.kubedl.io/v1alpha1 kind: ModelVersion metadata: name: mv-4 namespace: default spec: modelName: model1 createdBy: user1 imageRepo: modelhub/model1 storage: localStorage: path: /foo nodeName: node1
kubectl get mv (short for modelversion) NAME MODEL IMAGE CREATED-BY FINISH-TIME mv-4 model1 modelhub/model1:v1c072 user1 2021-04-19T21:45:29Z
kubectl describe mv mv-4 ... Spec: Created By: user1 Image Repo: modelhub/model1 Model Name: model1 Storage: Local Storage: Node Name: node1 Path: /foo Status: Finish Time: 2021-04-19T21:45:29Z Image: modelhub/model1:v1c072 Image Build Phase: ImageBuildSucceeded Message: Image build succeeded.