We are seeking an exceptional Principal AI/ML Engineer to drive technical leadership in the development of advanced machine learning systems and scalable AI infrastructure. This role requires a visionary technologist who can shape AI/ML architectures, optimize distributed learning frameworks, and mentor engineering teams while remaining deeply engaged in technical execution.
Role Impact
- Lead the architectural design of scalable AI/ML systems for cloud and on-premises environments
- Define AI/ML strategy across multiple products, ensuring technical innovation and scalability
- Provide technical mentorship to engineering and research teams
- Drive best practices in AI/ML model development, deployment, and optimization
- Bridge business requirements with AI/ML technical implementation
Core Responsibilities
- Architect and optimize large-scale distributed AI/ML systems
- Design and implement ML pipelines, including data ingestion, model training, and deployment
- Lead technical decision-making for AI/ML initiatives across product teams
- Collaborate with product managers and domain experts to align AI/ML solutions with business goals
- Establish best practices in model reproducibility, deployment, monitoring, and performance evaluation
- Review and approve major architectural decisions related to AI/ML infrastructure
- Mentor senior engineers and data scientists on AI/ML advancements and scalability
- Evaluate and integrate new AI/ML technologies, ensuring cutting-edge adoption
Required Technical Expertise
- Deep expertise in AI/ML algorithms, distributed learning, and model optimization
- Proficiency in multiple AI/ML programming languages, including:
- Python (TensorFlow, PyTorch)
- Scala (Spark ML)
- C++ (High-performance computing)
- Strong understanding of data pipelines, feature engineering, and ML lifecycle management
- Extensive experience with scalable AI/ML architectures in cloud and on-premises environments
- Proven track record in designing and deploying AI-driven systems at scale
- Deep knowledge of system performance, parallel computing, and accelerated hardware (GPUs, TPUs)
Preferred Qualifications
- Experience with generative AI, LLMs, or deep learning architectures
- Background in reinforcement learning, graph neural networks, or self-supervised learning
- Contributions to open-source AI/ML projects
- Expertise in MLOps, infrastructure automation, and model monitoring
- Knowledge of container orchestration for AI workloads (Kubernetes, Ray)
- Understanding of real-time AI inference systems and edge AI deployment
- Experience optimizing large-scale data processing pipelines
Leadership Competencies
- Exceptional communication skills across technical and non-technical audiences
- Strong ability to influence AI/ML strategies across an organization
- Technical vision and ability to translate AI concepts into actionable solutions
- Track record of mentoring and growing senior AI/ML engineers
- Ability to balance AI model performance with business impact
- Self-directed, with excellent project and time management skills
Technical Environment
- AI/ML Frameworks: TensorFlow, PyTorch, Scikit-Learn, Spark ML
- Languages: Python, C++, Scala
- Infrastructure: Cloud AI (AWS SageMaker, GCP Vertex AI, Azure ML) and on-prem AI clusters
- Data Processing: Apache Spark, Kafka, Airflow
- Model Deployment: Kubernetes, TensorFlow Serving, Triton Inference Server
- Development Practices: MLOps, CI/CD for AI models, model evaluation strategies
Impact Opportunities
- Shape the technical direction of next-generation AI/ML infrastructure
- Architect scalable and efficient AI systems for production environments
- Influence the development of cutting-edge AI models deployed at scale
- Drive advancements in automated ML pipelines and real-time inference solutions
- Mentor and cultivate AI talent across the organization
The ideal candidate will combine deep AI/ML expertise with strategic thinking and technical leadership. They should be passionate about building scalable, high-performance AI systems and capable of bridging research with production-level implementations.