Engineer, AI/ML & Analytics Platform Engineering

  • Plainsboro Township, NJ, USA
  • Full-Time
  • On-Site

Job Description:

We are seeking an experienced AI/ML & Analytics platform engineer who is passionate about data, digital, and AI for our client in the Pharmaceutical/Bio-tech field. This role is pivotal in securing the efficient and compliant operations of our developer platform, ensuring successful value delivery to the business.

Location: Hybrid, Plainsboro Township, NJ, US

This role is based out of their Princeton or Copenhagen office and requires for you to be on site 60% of the time.

Position Overview:

You are highly technical and a hands-on individual who will build out the core features and capabilities of our AI/ML & Analytics developer platform. You will solve technical & architectural challenges to deliver a scalable, secure, and feature robust platform. You will work closely with our cross functional spoke teams to understand current and evolving AI/ML & Analytics needs to align platform and feature build-out. You will champion self-service usage patterns for end users and accelerate our usage of IaC and GitOps to build and scale these solutions. You will drive continuous improvements to the platform to ease of use and efficiency for end-users.

You embody the idea that good platform engineering is rarely ever seen only felt. You have an automation-first mindset, are security-conscious, and keen on improving the in-house developer experience.

Responsibilities

  • Contribute to the build out of the AI/ML & Analytics platform, services, and tools (across dev, test, and prod) that accelerate model training, inference, and deployment within our spoke teams
  • Build platform capabilities to support both batch and real-time workflows at scale with flexible deployment strategies to accommodate varying use cases (e.g. low-latency predictions, offline model inference)
  • Improve platform performance, reduce manual intervention, scale compute, and increase deployment efficiency.
  • Work with the foundational cloud teams to ensure platform operational effectiveness, reliability, security and efficiency.
  • Work with team members to provide technical guidance and implementations for monitoring systems (e.g. registry, alerting, etc.) and governance frameworks (e.g. regulatory compliance).
  • Collaborate with our spoke teams for AI/ML & Analytics system architecture design, deployment pipelines, and solution scaling.

Qualifications

  • Bachelors or Masters in a quantitative subject (e.g. Computer Science, Engineering, Data Science, Mathematics, Statistics, Operations Research) or a related field with 5+ years of experience.
  • Experience in building AI/ML & Analytics or related platforms for ML Researchers, ML Engineers, Data Scientists, and Data Analysts
  • Experience building scalable self-service systems or platforms using microservices and/or event-based services
  • Strong knowledge of commonly used AI/ML & Analytics programming languages such as Python, Spark, SQL or similar, with experience in machine learning frameworks like PyTorch or TensorFlow.
  • Experience with the AWS cloud-service ecosystem including AI/ML & Analytics related services (e.g. Sagemaker, etc.)
  • Experience implementing IaC (Terraform, OpenTofu, CDK, Pulumi, etc.) + CI/CD for deploying cloud-based platform infrastructure at scale
  • Knowledge of basic software development tools including VCS (GitHub, GitLab, etc.), CI/CD (GitHub/Lab Actions, Jenkins, etc.), JIRA
  • Knowledge of containerization (e.g. Docker, Podman, etc.) and orchestration tools (e.g. Kubernetes, Rancher, etc.)
  • Experience with large scale CPU, GPU and/or multi-GPU infrastructure (bonus for CUDA fundamentals)
  • Knowledge of fundamental ops capabilities such as registries, tracking, observability, and monitoring.
  • Experience analyzing and improving system performance and reducing costs.
  • Strong communication skills and ability to engage with stakeholders effectively.

Bonus Qualifications:

  • Prior experience working within the pharma/biotech domain
  • Proficiency in at least one or more strongly typed programming language such as C/C++, Java, Go, Rust, or similar with associated OO or functional design principals.
  • Experience with large-scale distributed systems (e.g. Ray, Dask, Spark, etc.) and high-performance computing environments (e.g. Slurm, etc.)
  • In-depth knowledge of data platforms (e.g. Databricks, Snowflake, or Lake Formation) and tools (e.g. dbt) and their underlying technologies (e.g. Delta, Iceberg, Hudi, Spark).
  • Prior work building and using real-time/streaming infrastructure (e.g., Kafka, Spark Streaming).
  • Experience with GitOps style tools for building and enabling developer platforms (e.g. ArgoCD, Crossplane, etc.)
  • Experience with multi-cloud platform development (e.g. some combination of AWS, GCP, Azure)
  • Knowledge of high-performance frameworks for inference and training/fine-tuning (e.g. onnxRT, tensorRT, Triton, etc.) or resource intensive GenAI.