AWS Machine Learning Operations Specialist
Calabrio
Are you driven by innovation and looking to thrive in a fast-paced, growing environment? Join us at Calabrio and be part of our dynamic team! Help us in reshaping the landscape of customer experience – where every interaction becomes an opportunity, and every insight drives meaningful change.
Introducing Calabrio – The trailblazers in customer experience intelligence! Revolutionizing the way organizations connect with their customers, we empower businesses to elevate every interaction to new heights. Our cutting-edge cloud platform, coupled with AI-driven analytics tools, unlocks the true essence of customer sentiment, turning data into actionable insights with lightning speed.
We are seeking a highly skilled and motivated AWS MLOps Specialist to join our dynamic team. This role bridges the gap between machine learning operations, ETL operations, and traditional DevOps, focusing on deploying, managing, and optimizing ML and ETL workflows and infrastructure. The ideal candidate will have hands-on experience in both MLOps and DevOps practices and expertise in managing AWS cloud infrastructure. An AWS certification or significant experience in AWS environments is highly desirable.
What you'll be doing (Key Responsibilities):
MLOps Responsibilities:
- Design, deploy, and maintain scalable ML and ETL pipelines in production environments.
- Implement CI/CD workflows for ML models and ETL, ensuring reliable and automated deployment processes.
- Monitor and optimize ML/ETL performance, ensuring efficient resource utilization.
- Collaborate with ML Engineers to integrate ML/ETL workflows into scalable production systems.
DevOps Responsibilities:
- Design, implement, and manage CI/CD pipelines for software applications.
- Automate infrastructure provisioning, configuration, and scaling using AWS CDK and tools like Terraform or CloudFormation.
- Monitor and troubleshoot production systems to ensure high availability and reliability.
- Develop robust logging, monitoring, and alerting solutions using tools like Datadog, CloudWatch, or Prometheus.
AWS Cloud Responsibilities:
- Architect and manage AWS-based infrastructure for both machine learning and software systems.
- Optimize AWS services (e.g., EC2, ECS, S3, Lambda, SageMaker) to meet performance and cost requirements.
- Ensure security and compliance in AWS environments through best practices and tools.
- Leverage AWS services for deploying containerized applications using ECS, EKS, or Fargate.
We're looking for:
- Strong problem-solving and analytical skills.
- Excellent communication and collaboration skills to work with cross-functional teams.
- Proactive and self-driven with a focus on continuous learning and improvement.
- Strong expertise in containerization (Docker) and orchestration tools (Kubernetes).
- Hands-on experience with AWS infrastructure and services, with preference for AWS-certified professionals.
- Familiarity with infrastructure-as-code tools (e.g., Terraform, CloudFormation).
- Knowledge of ML frameworks (e.g., TensorFlow, PyTorch, Scikit-learn) and model deployment practices.
- Experience with GPU VMs to train ML models or serve ML models.
- Experience with SQL and NO-SQL databases like PostgreSQL, Snowflake, DynamoDB, or MongoDB.