I am passionate about open source, NLP, protecting sensitive information, building efficient data pipelines, and improving search.
TL;DR: I love building stuff!
I am Jeff and I am an independent consultant. I am a certified cloud architect, with a background is in software engineering, DevOps, big-data, and natural language processing. I have earned many AWS and Google Cloud certifications. I am a software engineer at heart, but I enjoy everything from cloud architecture, to data and search, and, of course, AI/ML.
I am available to hire for consulting. I can help with your cloud (AWS/Google Cloud) architecture, software development, data pipelines, search, and NLP projects.
I am proud to have a lot of AWS and Google Cloud certifications. Here are my certifications. My first AWS certifications were obtained in 2014. Back then the certifications had issue numbers!
I have been an AWS SME (starting in February 2020) and an AWS Lead SME, which means I helped contribute to the development of the AWS certifications exams. I have found there is no better way to learn because writing wrong answers requires an in-depth knowledge of AWS! I "retired" as an AWS SME in 2023 to let others share in the rewarding opportunity.
I have been an AWS Community Builder in 2023, 2022, and 2021, for Machine Learning.
I have worked in many industries from government, security, healthcare, logistics, and education. I first started with cloud when AWS first launched around 2008.
My interest in natural language processing was sparked in 2012 when I was working with unstructured data and learning about the challenges in making it useful. That led me to the Apache OpenNLP project where I started contributing. Today, I am the chair of the Apache OpenNLP PMC (which just means I submit the project reports ;). I owe the Apache OpenNLP team a lot of gratitude for their help and guidance over the years. I added ONNX Runtime support to Apache OpenNLP to facilitate the use of large-language models from Java. Check out the blog post I wrote on the Microsoft website!
As a programmer, I started with QBasic, QuickBasic, and Visual Basic for DOS. I picked up C++ through Visual Studio 6 before moving to .NET. I moved to mostly Java around 2010 and have been there since. I also develop in Python and Go, and have experience with Scala. I do a lot of Bash, CloudFormation, and Terraform scripting and use Linux almost exclusively. Ubuntu has been my first pick since it was versioned using single digits and mailed to me on CD-ROMs!
Check out my conference presentations below for a general idea of my favorite tools and areas.
Today, I do consulting primarily in the areas of cloud, data, search, and NLP through my company Mountain Fog. I enjoy tackling problems around data ingestion, search, and AI/ML because those problems often fall directly in the intersections of my interests.
I believe search is often overlooked yet very important because without the ability to efficiently locate data, the data is worthless. Vector search is now adding a new dimension to search - pun totally intended.
I am the maintainer of the Amazon OpenSearch User Behavior Insights plugin, and I try to be active in the OpenSearch community.
I created and maintain the Phileas project, along with the Philter software, under Philterd, LLC. Redacting PII and PHI from text is very challenging but important and necessary.
My Work
Through Mountain Fog, I offer consulting services in the areas of cloud (AWS/GCP), data pipelines, search, and natural language processing (NLP).
I started Philterd to provide AI-powered software to redact PII. Learn more at www.philterd.ai.
Open Source
I am passionate about open source and try to contribute whenever possible. Where I'm most active:
Apache OpenNLP - I am a committer and PMC chair of the Java NLP library.
OpenSearch User Behavior Insights Plugin - I am a maintainer and developer of the OpenSearch plugin for search relevance.
Phileas - I am the creator and owner of the Java library for finding and redacting PII.
There are a lot of other PII-focused open source projects I've made on GitHub.
I believe industry certifications provide an important method of encouragement and a valuable way for engineers to validate and showcase their experience and skills.
My AWS Certifications
My AWS Certifications transcript
My Google Cloud Certifications
Visit my Google Cloud transcript and my Google Developers profile.
Apache Community Over Code
October 2024 - Denver, Colorado
Title: Building an NLP Model Training Pipeline with Apache OpenNLP and Apache NiFi
Technologies: Apache OpenNLP, Apache NiFi, NLP pipelines
Apache Community Over Code
October 2023 - Halifax, Nova Scotia, Canada
Title: Apache OpenNLP and LLMs – Where does OpenNLP fit in?
Technologies: Apache OpenNLP, Large-language models, NLP model training and evaluation
Resources: Presentation slides
Linux Foundation Open Source Summit
May 2023 – Vancouver, Canada
Title: Using Apache OpenNLP with OpenSearch k-NN Vector Search
Technologies: Apache OpenNLP, ONNX Runtime, OpenSearch
Resources: Code Repository
ApacheCon
October 2022 – New Orleans, LA
Title: What’s New and Coming in Apache OpenNLP 2.0
Technologies: Apache OpenNLP, ONNX Runtime
Amazon Web Services OpenSearchCon
September 2022 – Seattle, WA
Title: Getting the most from your OpenSearch Contributions
Technologies: Open Source
Recording: https://www.youtube.com/watch?v=3j3IA546JQ8
Linux Foundation Open Source Summit
June 2022 – Austin, TX
Title: Searching for the right words: Bringing NLP Transformers to Apache Solr via Apache OpenNLP
Technologies: Apache OpenNLP, Apache Lucene, Apache Solr
Resources: Code Repository
Recording: https://www.youtube.com/watch?v=x5za13Jc5OY
Berlin Buzzwords
June 2021 – Virtual
Title: Applied MLOps to Maintain Model Freshness on Kubernetes
Technologies: NLP, Text classification, Kubernetes, MLOps
Resources: Code Repository
Recording: https://www.youtube.com/watch?v=-tzCH9YuM6s
HashiTalks
March 2021 – Virtual
Title: From Training to Serving: Machine Learning Models with Terraform
Technologies: AWS, Terraform, MLOps, NLP
Resources: Code Repository
Strata Data
September 2019 – New York, NY USA
Title: Protecting the Healthcare Enterprise from PHI Breaches using Streaming and NLP
Technologies: Apache Kafka, Apache Flink, NLP
Activate Search and AI Conference
September 2019 – Washington, DC, USA
Title: Leveraging Neural Networks and Learning-to-Rank in Document Workflows
Technologies: NLP (document classification), Learning-to-Rank
Recording: https://www.youtube.com/watch?v=vja4P55OSag
DataWorks Summit Washington DC
May 2019 – Washington, DC, USA
Title: Improving Organizational Knowledge with Natural Language Processing Enriched Data Pipelines
Technologies: Apache NiFi, Apache Kafka, Apache OpenNLP, Apache Superset
Resources: Code Repository
PyData Washington DC
November 2018 – McLean, Virginia, USA
Title: Using Sockeye Neural Machine Translation in a Streaming Pipeline
Technologies: Apache Flink, Sockeye
Recording: https://www.youtube.com/watch?v=Pzt4g5Z-FBI
Activate Search and AI Conference
October 2018 – Montreal, Quebec, Canada
Title: Embracing Diversity: Searching Over Multiple Languages
Technologies: Apache NiFi, Apache OpenNLP, Apache Solr, Sockeye
Recording: https://www.youtube.com/watch?v=ek-crQwMfnQ
Haystack Search Relevance Conference
April 2018 – Charlottesville, Virginia, USA
Title: Embracing Diversity: Implementing Multi-language Search
Technologies: Apache NiFi, Apache Joshua, Elasticsearch
Resources: Code Repository, Presentation
© 2023 Jeff Zemerick. All Rights Reserved.