Jeff Zemerick

Search, NLP & Cloud Consulting

25+ years designing and building high-stakes distributed systems, from NASA flight software to Fortune 500 enterprises and high-growth startups.

25+ Years Experience

17x AWS Certified

8x GCP Certified

19+ Conf. Talks

LinkedIn GitHub

About Me

I’m a senior engineer specializing in cloud architecture, NLP, and search, with over 25 years of experience designing high-stakes distributed systems. My career spans mission-critical government work: software engineering and verification for NASA’s Mars “Curiosity” rover and the FBI’s N-DEx project, through Fortune 500 cloud transformations, data engineering, search relevance, and open-source leadership.

I specialize in multi-cloud architectures on AWS and GCP, scalable data pipelines, natural language processing, and search engineering. I hold 17 AWS certifications and 8 Google Cloud certifications, and as an AWS Gold Jacket owner I’ve achieved every AWS certification. I’ve also been recognized as an AWS Community Builder. As PMC Chair of Apache OpenNLP and an Apache Software Foundation Member, I actively shape one of the most widely used NLP frameworks in the Java ecosystem.

Across every role I write production code, pair with engineering teams, and leave behind systems that are documented, tested, and fully owned by the people who inherit them.

Outside of tech, I’m a wilderness guide at Deliberate Pace Guiding, leading backcountry trips.

How I Can Help

I partner with teams to design, build, and de-risk their most important cloud, search, and NLP systems.

Cloud Architecture & Migration

I design secure, highly-available AWS and GCP architectures and lead migrations off legacy infrastructure: HIPAA-compliant, DevSecOps-driven, and delivered as Infrastructure as Code your team can own and extend.

Data Engineering

I build efficient, maintainable data pipelines on open-source streaming technologies like Apache NiFi, Kafka, Flink, and Spark, so your data moves reliably from source to analytics and ML workloads.

Natural Language Processing

I build and fine-tune NLP systems for named-entity recognition, document classification, and text processing with Apache OpenNLP and ONNX, plus the pipelines to keep your models fresh.

Search Engineering

I turn “our search is bad” into measurable relevance gains by defining KPIs and building learning-to-rank, vector, hybrid, and multi-language search on OpenSearch, Elasticsearch, and Apache Solr.

Data Governance & Compliance

I design data governance and compliance engineering: classification, retention policies, and automated discovery for HIPAA, GDPR, and CCPA, so sensitive data is handled correctly at scale.

Open-Source Depth

You work with a hands-on contributor to the tools you rely on: PMC Chair of Apache OpenNLP, an OpenSearch maintainer, and a regular speaker at NLP, search, and cloud conferences.

Have a project in mind? I take on engagements ranging from short architecture reviews to multi-month builds. Currently booking Q4 2026.

Certifications

Fully AWS certified (17x) and Google Cloud certified (8x). Browse the AWS and Google Cloud transcripts on Credly.

AWS Amazon Web Services · 17x certified

GCP Google Cloud · 8x certified

Other Additional Certifications

Certified Kubernetes Application Developer (CKAD)
HashiCorp Certified: Terraform Associate
CompTIA Security+
GitLab Certified Associate

Previous Clients

A selection of organizations across healthcare, finance, government, e-commerce, logistics, natural resources, and data & AI.

Selected Work

A few representative engagements: the problem, the approach, and what the client walked away with.

Open-Source Data Pipelines

Designed and built efficient, maintainable data pipelines on open-source technologies like Apache NiFi, Kafka, Flink, and Spark for ingest, transformation, and delivery into analytics and ML workloads. Emphasis on Infrastructure as Code and documentation so the client’s own team could own and extend the systems after handoff.

Enterprise Search Relevance

Improved the performance and relevancy of search systems for enterprise clients across multiple verticals. Defined search KPIs, stood up offline search labs, and developed learning-to-rank models, turning subjective “bad search” complaints into measurable, repeatable relevance gains.

Healthcare Cloud at Scale

Served as Agile DevOps lead for an AWS infrastructure team of 15 engineers building commercial healthcare applications. Drove the transition to a DevSecOps model and designed HIPAA-compliant architectures spanning microservices, API gateways, and streaming data on Kafka and Flink.

Technical Skills

Programming Languages

Java Spring Boot Python Go Scala C# / .NET Bash

Cloud & Infrastructure

AWS Google Cloud Terraform Kubernetes Docker CloudFormation

Data & Streaming

Apache Kafka Apache NiFi Apache Flink Apache Spark HDFS Apache Hive

Search & NLP

OpenSearch Elasticsearch Apache Solr Apache OpenNLP ONNX Learning to Rank

Community & Open Source

Apache OpenNLP | PMC Chair

I serve as PMC Chair of Apache OpenNLP and as an Apache Software Foundation Member. I’ve contributed to the project for over 15 years, shaping release cadence, mentoring committers, and stewarding one of the most widely used NLP frameworks in the Java ecosystem.

OpenSearch Maintainer

I’m an OpenSearch maintainer on two projects: UBI (User Behavior Insights), which lets search relevance teams capture and analyze implicit user feedback for learning-to-rank and A/B testing, and opensearch-migrations, which helps teams move between search engines and across OpenSearch versions. I’m an OpenSearch member and active contributor.

Open Source Software

I founded Phileas, an open-source text processing library available in Java, Python, and Go, adopted in commercial products including Graylog. I contribute to a broad range of NLP, search, and data infrastructure projects.

AWS Community Builder

Recognized as an AWS Community Builder. I’ve authored questions for AWS certification exams and published commercial products to the AWS Marketplace, staying close to both the practitioner community and AWS engineering standards.

Conference Presentations

A selection of talks at international conferences on NLP, search, cloud, and AI.

September 2026San Jose, CA
Leveraging LDP for High-Trust OpenSearch UBI
OpenSearchCon
October 2025Minneapolis, MN
Apache OpenNLP and the Model Context Protocol
Apache Community Over Code
October 2025Minneapolis, MN
Using Modern NLP Models from Apache OpenNLP with Solr
Apache Community Over Code
June 2025San Jose, CA
Demonstrating ONNX in Apache Solr search applications via Apache OpenNLP
ONNX Community Meetup
May 2025New York, NY
Catching the Quiet Thief: Detecting Low-and-Slow API Data Exfiltration
API Days
October 2024Denver, CO
Building an NLP Model Training Pipeline with Apache OpenNLP and Apache NiFi
Apache Community Over Code
October 2023Halifax, Nova Scotia
Apache OpenNLP and LLMs – Where Does OpenNLP Fit In?
Apache Community Over Code
May 2023Vancouver, Canada
Using Apache OpenNLP with OpenSearch k-NN Vector Search
Linux Foundation Open Source Summit
October 2022New Orleans, LA
What’s New and Coming in Apache OpenNLP 2.0
ApacheCon
September 2022Seattle, WA
Getting the Most from Your OpenSearch Contributions
Amazon Web Services OpenSearchCon · Recording
June 2022Austin, TX
Searching for the Right Words: Bringing NLP Transformers to Apache Solr via Apache OpenNLP
Linux Foundation Open Source Summit · Recording
June 2021Virtual
Applied MLOps to Maintain Model Freshness on Kubernetes
Berlin Buzzwords · Recording
February 2021Virtual
From Training to Serving: Machine Learning Models with Terraform
HashiTalks
September 2019New York, NY
Protecting the Healthcare Enterprise from PHI Breaches using Streaming and NLP
Strata Data
September 2019Washington, DC
Leveraging Neural Networks and Learning-to-Rank in Document Workflows
Activate Search and AI Conference · Recording
May 2019Washington, DC
Improving Organizational Knowledge with NLP-Enriched Data Pipelines
DataWorks Summit
November 2018McLean, VA
Using Sockeye Neural Machine Translation in a Streaming Pipeline
PyData Washington DC · Recording
October 2018Montreal, Canada
Embracing Diversity: Searching Over Multiple Languages
Activate Search and AI Conference · Recording
April 2018Charlottesville, VA
Embracing Diversity: Implementing Multi-language Search
Haystack Search Relevance Conference

From the Blog

Recent writing on NLP, search, cloud, and open source.

Leveraging LDP for High-Trust OpenSearch UBI

July 20, 2026

A teaser for my OpenSearchCon 2026 talk on using local differential privacy (LDP) to capture OpenSearch User Behavior Insights (UBI) data without collecting anyone's individual behavior, so you get better relevance and stronger privacy at the same time.

Migrating Apache Solr to OpenSearch with the Migration Assistant

July 19, 2026

Why teams move from Apache Solr to OpenSearch, and how the OpenSearch Migration Assistant makes the move a repeatable, low-risk process rather than a hand-rolled reindex. Elasticsearch is a supported source too.

What is Learning to Rank?

June 20, 2026

A plain-language introduction to learning to rank (LTR): using machine learning to order search results, the judgments and features that train a model, and how the OpenSearch Learning to Rank plugin puts it to work.

View all posts