Software Engineer-Data Engineering, Machine Learning (ML)
Company: American Association of Motor Vehicles
Location: Arlington
Posted on: February 21, 2026
|
|
|
Job Description:
Job Description Job Description Position Summary: The IT
Division is responsible for the development and operations of
information systems for the State and Federal agencies doing
business related to or using information from the administration of
motor vehicles and driver licenses. The Machine Learning (ML) Data
Engineer position has core responsibilities for the design,
development, deployment, and operational support of machine
learning solutions on cloud infrastructure. This includes the full
model lifecycle — from data acquisition and dataset preparation
through feature engineering, experimentation, model training,
validation, production deployment, and ongoing monitoring. Current
applications include anomaly detection across high-volume messaging
networks, but the scope encompasses any ML capability that
strengthens system reliability, operational intelligence, and
data-driven decision-making across AAMVA systems. Essential Duties
and Responsibilities: We are seeking a talented Data Engineer with
machine learning experience to join our team. You will design,
build, and operationalize ML solutions running on cloud
infrastructure (Azure or AWS). You will work across the full model
lifecycle: preparing datasets, engineering features, running
experiments, deploying models to production, and operating them on
cloud infrastructure. As a detail-oriented professional, you have a
strong track record of independently managing projects and driving
them to successful completion. Your statistical foundation and
engineering discipline enable you to move from exploratory analysis
through to production-grade, monitored solutions. You communicate
clearly with both technical and non-technical stakeholders —
translating model behavior, data constraints, and engineering
trade-offs into terms that drive decisions. You operate effectively
across the broader IT organization, with sufficient general IT
fluency to understand how ML systems interact with infrastructure,
security, operations, and business workflows, and you proactively
build those connections rather than working in a data silo. Key
responsibilities include: Designing and building dataset
preparation pipelines — acquiring, cleaning, transforming, and
versioning data for ML training and evaluation Engineering features
that extract meaningful signals from structured and semi-structured
data sources (time-series patterns, statistical profiles,
categorical encodings) Running structured experimentation — testing
multiple algorithms against defined scenarios, measuring
performance, and documenting findings Training, evaluating, and
tuning ML models including regression, classification, clustering,
anomaly detection, and ensemble methods Deploying models to
production on cloud infrastructure and building the pipelines that
keep them running (retraining, scoring, threshold management)
Monitoring model performance in production — tracking drift, false
positive rates, and detection efficacy over time Building and
maintaining batch and streaming data pipelines using Synapse,
Fabric, Spark, and Event Hubs that feed ML systems Writing and
optimizing analytical queries (SQL, KQL, PySpark) for data
exploration, statistical profiling, and real-time analysis Creating
validation frameworks — synthetic test data generation, backtesting
against historical logs, and shadow-mode evaluation Building
dashboards and visualizations that communicate model outputs to
technical and non-technical stakeholders Collaborating with
cross-functional teams to identify ML opportunities and translate
operational problems into data solutions; communicating findings,
trade-offs, and model behavior clearly to technical and
non-technical audiences across IT, operations, and leadership
Direct Reports: None QUALIFICATIONS Formal Education: Bachelor's
degree in computer science, data science, statistics, mathematics,
or related quantitative field. Equivalent work experience may be
substituted Knowledge, Skills, and Abilities: Basic Qualifications
3–5 years of hands-on experience in data engineering, ML
engineering, or applied analytics Hands-on cloud platform
experience (Azure or AWS) building and deploying data or ML
solutions on managed cloud services; specific platform less
important than depth of experience Working knowledge of statistical
foundations: distributions, variance, standard deviation, trend vs.
seasonality, hypothesis testing, and how to apply them to real
operational data Experience with the ML experiment-to-production
cycle: dataset preparation, feature engineering, model training,
evaluation, and deployment Proficiency in Python for data
processing, statistical analysis, and ML model development Strong
SQL skills with understanding of relational database fundamentals:
data modeling, query optimization, indexing strategies, and how SQL
Server infrastructure supports production workloads (T-SQL, stored
procedures, Availability Groups) Experience building data pipelines
that handle batch and streaming workloads Experience with version
control systems (Git) and CI/CD practices Strong problem-solving
skills, attention to detail, and ability to work independently on
ambiguous problems Strong written and verbal communication skills —
able to explain technical findings to non-technical stakeholders
and engage productively across IT, operations, and leadership;
comfort operating outside the ML silo and contributing to broader
technology discussions Preferred Qualifications Experience with
time-series analysis, anomaly detection, or statistical process
control on operational data Familiarity with unsupervised and
semi-supervised techniques (isolation forest, clustering, ensemble
methods) Experience building and managing ML model lifecycle on
Azure (MLflow, Fabric ML, Azure ML) or AWS (SageMaker, Glue, Step
Functions) Familiarity with KQL (Kusto Query Language) for
time-series decomposition, log analytics, or real-time data
exploration Knowledge of data modeling and dimensional modeling
concepts Experience with synthetic test data generation and model
validation frameworks Familiarity with operations and monitoring of
mission-critical data platforms Technical Stack Core Technologies:
Microsoft Fabric, Azure Synapse Analytics, Apache Spark, Delta
Lake, Azure Event Hubs ML & Analytics: scikit-learn, PySpark ML,
statistical modeling, time-series analysis, feature engineering,
model validation Languages: Python, SQL, PySpark, KQL, C# Data
Infrastructure: T-SQL, Stored Procedures, SQL Server Availability
Groups Azure Services: Azure Functions, Azure Data Factory, Azure
Key Vault Optional: Databricks, Snowflake, Lakehouse Architecture,
Azure OpenAI; AWS candidates: equivalent services (SageMaker, Glue,
Kinesis, Redshift) are acceptable in place of Azure-specific stack
items Visualization: Power BI Development: Azure DevOps, CI/CD
Disclaimer Statement: The preceding job description has been
written to reflect management’s assignment of essential functions.
It does not prescribe or restrict the tasks that may be assigned.
AAMVA is an Equal Opportunity Employer/Veterans/Disabled
Keywords: American Association of Motor Vehicles, Sterling , Software Engineer-Data Engineering, Machine Learning (ML), IT / Software / Systems , Arlington, Virginia