Pluriscale - Principal AI/ML Engineer – Infrastructure Systems

We are seeking an exceptional Principal AI/ML Engineer to drive technical leadership in the development of advanced machine learning systems and scalable AI infrastructure. This role requires a visionary technologist who can shape AI/ML architectures, optimize distributed learning frameworks, and mentor engineering teams while remaining deeply engaged in technical execution.

Role Impact

Lead the architectural design of scalable AI/ML systems for cloud and on-premises environments
Define AI/ML strategy across multiple products, ensuring technical innovation and scalability
Provide technical mentorship to engineering and research teams
Drive best practices in AI/ML model development, deployment, and optimization
Bridge business requirements with AI/ML technical implementation

Core Responsibilities

Architect and optimize large-scale distributed AI/ML systems
Design and implement ML pipelines, including data ingestion, model training, and deployment
Lead technical decision-making for AI/ML initiatives across product teams
Collaborate with product managers and domain experts to align AI/ML solutions with business goals
Establish best practices in model reproducibility, deployment, monitoring, and performance evaluation
Review and approve major architectural decisions related to AI/ML infrastructure
Mentor senior engineers and data scientists on AI/ML advancements and scalability
Evaluate and integrate new AI/ML technologies, ensuring cutting-edge adoption

Required Technical Expertise

Deep expertise in AI/ML algorithms, distributed learning, and model optimization
Proficiency in multiple AI/ML programming languages, including:
Python (TensorFlow, PyTorch)
Scala (Spark ML)
C++ (High-performance computing)
Strong understanding of data pipelines, feature engineering, and ML lifecycle management
Extensive experience with scalable AI/ML architectures in cloud and on-premises environments
Proven track record in designing and deploying AI-driven systems at scale
Deep knowledge of system performance, parallel computing, and accelerated hardware (GPUs, TPUs)

Preferred Qualifications

Experience with generative AI, LLMs, or deep learning architectures
Background in reinforcement learning, graph neural networks, or self-supervised learning
Contributions to open-source AI/ML projects
Expertise in MLOps, infrastructure automation, and model monitoring
Knowledge of container orchestration for AI workloads (Kubernetes, Ray)
Understanding of real-time AI inference systems and edge AI deployment
Experience optimizing large-scale data processing pipelines

Leadership Competencies

Exceptional communication skills across technical and non-technical audiences
Strong ability to influence AI/ML strategies across an organization
Technical vision and ability to translate AI concepts into actionable solutions
Track record of mentoring and growing senior AI/ML engineers
Ability to balance AI model performance with business impact
Self-directed, with excellent project and time management skills

Technical Environment

AI/ML Frameworks: TensorFlow, PyTorch, Scikit-Learn, Spark ML
Languages: Python, C++, Scala
Infrastructure: Cloud AI (AWS SageMaker, GCP Vertex AI, Azure ML) and on-prem AI clusters
Data Processing: Apache Spark, Kafka, Airflow
Model Deployment: Kubernetes, TensorFlow Serving, Triton Inference Server
Development Practices: MLOps, CI/CD for AI models, model evaluation strategies

Impact Opportunities

Shape the technical direction of next-generation AI/ML infrastructure
Architect scalable and efficient AI systems for production environments
Influence the development of cutting-edge AI models deployed at scale
Drive advancements in automated ML pipelines and real-time inference solutions
Mentor and cultivate AI talent across the organization

The ideal candidate will combine deep AI/ML expertise with strategic thinking and technical leadership. They should be passionate about building scalable, high-performance AI systems and capable of bridging research with production-level implementations.