KubeDL automatically tunes the best container-level configurations before an ML model is deployed as inference services. This auto-configuration workflow is developed as an independent project---Morphling. Github →
Morphling paper accepted at ACM Socc 2021: Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving
Morphling tunes the optimal configurations for your ML/DL model serving deployments. It searches the best container-level configurations (e.g., resource allocations and runtime parameters) by empirical trials, where a few configurations are sampled for performance evaluation.
Key benefits include:
- Automated tuning workflows hidden behind simple APIs.
- Out of the box ML model serving stress-test clients.
- Cloud agnostic and tested on AWS, Alicloud, etc.
- ML framework agnostic and generally support popular frameworks, including TensorFlow, PyTorch, etc.
- Equipped with various and customizable hyper-parameter tuning algorithms.
Morphling requires users to specify the
ProflingExperiment interface for configuration tuning,
ML model container (e.g., the Pod template)
performance objective function
tunable configuration parameters with types and search range
Install using Heml
Install using YAML files
Run model serving configuration examples. Go →
See Morphling Workflow to check how Morphling tunes ML serving configurations automatically in a Kubernetes-native way.
If you have any questions or want to contribute, GitHub issues or pull requests are warmly welcome.