Model Deployment
Sparkflows enables models to be deployed in the development environment and consequentially pushed to production environment via one click from the Models page. Deploying models to production was never so seamless.
Administer production environments alongside code environment and infrastructure dependencies for both batch and real-time scoring. Deploy bundles and API services across development, testing, and production environments to ensure a resilient approach to updates in your machine learning pipelines.
By facilitating quicker experimentation, deployments, and comprehensive lineage tracking, Sparkflows enhances the overall efficiency of ML workflows.
Model Registry
Sparkflows enables the feature engineering pipeline and the ML model to be registered in a model registry. One can use Sparkflows as a Model registry or MLflow can be leveraged to be used as model registry.
The Model Registry is a centralized repository for storing, organizing, and managing machine learning models. It plays a crucial role in the lifecycle of machine learning projects, ensuring that models can be easily tracked, versioned, validated, and deployed.
After evaluating models against each other, one logs the top-performing model(s) in the Model Registry in preparation for deployment. Leverage multiple iterations of models of the same type through Model Versioning.
Model Deployment for Batch Scoring
Sparkflows workflows and nodes execute tasks essential for routine production activities like data updates, pipeline refreshes, and MLOps monitoring, which includes overseeing ML models or scheduling model retraining.
With robust deployment capabilities, data scientists and ML engineers can extend the deployment of services created within Sparkflows to cloud platforms like AWS Sagemaker, Azure Machine Learning, and Google Vertex, Kubeflow widening the range and flexibility of API deployment and facilitating seamless integration with various external platforms.
This enables users to deploy models on any cloud of their choice and leverage it for Batch scoring on bulk data.
Model Deployment for Real Time Scoring
Model Deployment for Real Time Scoring
Utilizing Sparkflows, one can establish an MLOps environment that operates in real-time. Prompt responses are facilitated through Sparkflows, which offer highly available infrastructure and dynamically scale cloud resources on AWS Sagemaker, Azure Machine Learning, and Google Vertex, Kubeflow among others.
On deployment, you get a REST API endpoint for real-time model inference. This end point can be invoked via Sparkflows monitoring end point, via curl, third party API, Python requests or via any other programming language. This capability supports the development of applications and use cases which leverage these deployed models.
Model Monitoring
Sparkflows provides single pane view to monitor different metrics of models deployed. This enables to keep a tab on models performance over a period of time by computing and monitoring metrics like drift, latency among others. The models deployed are associated with thresholds and alerts on different metrics. When metrics cross the thresholds, alerts are captured by the MLOps system and relayed to users via emails or downstream APIs are triggered if setup.
In Sparkflows, model assessment repositories record and illustrate performance metrics, guaranteeing the sustained delivery of top-notch results from live models. Should a model's performance decline, integrated drift analysis aids operators in identifying and examining potential instances of data, performance, or prediction drift, allowing for informed decision-making moving forward.
Model Retraining and CI/CD Pipelines
In Sparkflows, when data changes between training and production stages, it can diminish the effectiveness of models, a phenomenon known as "drift". This drift is monitored by comparing differences in data distribution between training and production data for each model attribute. Additionally, an AI-driven application for drift detection is available, catering specifically to data scientists.
Using the policy attached to each deployed model, the models can retrained by triggering the retraining pipeline of workflows. This is triggered automatically if the conditions of the policy attached to the model is satisfied. Many a times, when new data shows up quarterly and a model needs to be refitted, a time based retraining of the pipeline can also be setup via the policy.
These retraining can be setup as part of CI/CD pipelines. Sparkflows supports DevOps tools like Jenkins, GitLabCI, Travis CI, or Azure Pipelines.
Model Governance, Compliance and Explainability
Sparkflows MLOps ensures data integrity and traceability throughout the entirety of the machine learning journey, allowing for seamless tracking from data acquisition, experimentation, model registration, to deployment.
The IP's used to invoke the deployed models are also tracked. It also captures details like who deployed the model and what changes were made to the deployment over a period of time. Each activity in the MLOps system is tracked and can be seen in audit logs and dashboards.
Model Formats
Sparkflows supports storing, registering, deployment of most of the open model formats like PMML, H2O MOJO, Pickle files, MLeap among others.
Each of the above formats have their tradeoffs - some are platform independent, some have super fast scoring latency, some are more widely adopted. Sparkflows allows you to pick and choose the format in which you want to save the model provided the underlying algorithm supports the format.