I'm Jeff, a certified cloud architect and engineer.

I am passionate about open source, NLP, protecting sensitive information, building efficient data pipelines, and improving search.

Fully AWS certified | 7x GCP certified

TL;DR: I love building stuff!

I am Jeff and I am an independent consultant. I am a certified cloud architect, with a background is in software engineering, DevOps, big-data, and natural language processing. I have earned many AWS and Google Cloud certifications. I am a software engineer at heart, but I enjoy everything from cloud architecture, to data and search, and, of course, AI/ML.

I am available to hire for consulting. I can help with your cloud (AWS/Google Cloud) architecture, software development, data pipelines, search, and NLP projects.

My AWS and Google Cloud Experience

I am proud to have a lot of AWS and Google Cloud certifications. Here are my certifications. My first AWS certifications were obtained in 2014. Back then the certifications had issue numbers! I am now fully AWS certified having achieved all certifications.

I have been an AWS SME (starting in February 2020) and an AWS Lead SME, which means I helped contribute to the development of the AWS certifications exams. I have found there is no better way to learn because writing wrong answers requires an in-depth knowledge of AWS! I "retired" as an AWS SME in 2023 to let others share in the rewarding opportunity.


I have been an AWS Community Builder in 2023, 2022, and 2021, for Machine Learning.

About Search...

It's one thing to have good data, but it's only as good as your ability to efficiently retrieve it. I help folks make "search better" whether it's Amazon OpenSearch, Elasticsearch, or something else. Check out my work on User Behavior Insights and in OpenSearch.

My Background

I have worked in many industries from government, security, healthcare, logistics, and education. I first started with cloud when AWS first launched around 2008.

My interest in natural language processing was sparked in 2012 when I was working with unstructured data and learning about the challenges in making it useful. That led me to the Apache OpenNLP project where I started contributing. Today, I am the chair of the Apache OpenNLP PMC (which just means I submit the project reports ;). I owe the Apache OpenNLP team a lot of gratitude for their help and guidance over the years. I added ONNX Runtime support to Apache OpenNLP to facilitate the use of large-language models from Java. Check out the blog post I wrote on the Microsoft website!

As a programmer, I started with QBasic, QuickBasic, and Visual Basic for DOS. I picked up C++ through Visual Studio 6 before moving to .NET. I moved to mostly Java around 2010 and have been there since. I also develop in Python and Go, and have experience with Scala. I do a lot of Bash, CloudFormation, and Terraform scripting and use Linux almost exclusively. Ubuntu has been my first pick since it was versioned using single digits and mailed to me on CD-ROMs!

Check out my conference presentations below for a general idea of my favorite tools and areas.

My Work

Today, I do consulting primarily in the areas of cloud, data, search, and NLP through my company Mountain Fog. I enjoy tackling problems around data ingestion, search, and AI/ML because those problems often fall directly in the intersections of my interests.

I believe search is often overlooked yet very important because without the ability to efficiently locate data, the data is worthless. Vector search is now adding a new dimension to search - pun totally intended.

I am the maintainer of the Amazon OpenSearch User Behavior Insights plugin, and I try to be active in the OpenSearch community.

I created and maintain the Phileas project, along with the Philter software, under Philterd, LLC. Redacting PII and PHI from text is very challenging but important and necessary.

My Work

Through Mountain Fog, I offer consulting services in the areas of cloud (AWS/GCP), data pipelines, search, and natural language processing (NLP).

I started Philterd to provide AI-powered software to redact PII. Learn more at www.philterd.ai.

Open Source

I am passionate about open source and try to contribute whenever possible. Where I'm most active:

Apache OpenNLP - I am a committer and PMC chair of the Java NLP library.

OpenSearch User Behavior Insights Plugin - I am a maintainer and developer of the OpenSearch plugin for search relevance.

Phileas - I am the creator and owner of the Java library for finding and redacting PII.

There are a lot of other PII-focused open source projects I've made on GitHub.

My Certifications

I believe industry certifications provide an important method of encouragement and a valuable way for engineers to validate and showcase their experience and skills.

My AWS Certifications

I am fully AWS certified!

My AWS Certifications transcript

  • AWS Certified Solutions Architect – Professional
  • AWS Certified DevOps Engineer – Professional
  • AWS Certified Advanced Networking - Specialty
  • AWS Certified Data Analytics – Specialty
  • AWS Certified Databases – Specialty
  • AWS Certified Machine Learning – Specialty
  • AWS Certified Security – Specialty
  • AWS Certified Machine Learning Engineer - Associate Early Adopter
  • AWS Certified Developer – Associate
  • AWS Certified Solutions Architect – Associate
  • AWS Certified SysOps Administrator – Associate
  • AWS Certified AI Practitioner
  • AWS Certified Cloud Practitioner
  • AWS Certified Big Data – Specialty (Retired exam)
  • AWS Certified Alexa Skill Builder – Specialty (Retired exam)

My Conference Presentations

Apache Community Over Code

October 2024 - Denver, Colorado

Title: Building an NLP Model Training Pipeline with Apache OpenNLP and Apache NiFi

Technologies: Apache OpenNLP, Apache NiFi, NLP pipelines

Apache Community Over Code

October 2023 - Halifax, Nova Scotia, Canada

Title: Apache OpenNLP and LLMs – Where does OpenNLP fit in?

Technologies: Apache OpenNLP, Large-language models, NLP model training and evaluation

Resources: Presentation slides

Linux Foundation Open Source Summit
May 2023 – Vancouver, Canada
Title: 
Using Apache OpenNLP with OpenSearch k-NN Vector Search
Technologies: Apache OpenNLP, ONNX Runtime, OpenSearch
Resources: Code Repository

ApacheCon
October 2022 – New Orleans, LA
Title: What’s New and Coming in Apache OpenNLP 2.0
Technologies: Apache OpenNLP, ONNX Runtime

Amazon Web Services OpenSearchCon
September 2022 – Seattle, WA
Title: 
Getting the most from your OpenSearch Contributions
Technologies: Open Source
Recording: https://www.youtube.com/watch?v=3j3IA546JQ8

Linux Foundation Open Source Summit
June 2022 – Austin, TX
Title: 
Searching for the right words: Bringing NLP Transformers to Apache Solr via Apache OpenNLP
Technologies: Apache OpenNLP, Apache Lucene, Apache Solr
Resources: Code Repository
Recording: https://www.youtube.com/watch?v=x5za13Jc5OY

Berlin Buzzwords
June 2021 – Virtual
Title: 
Applied MLOps to Maintain Model Freshness on Kubernetes
Technologies: NLP, Text classification, Kubernetes, MLOps
Resources: Code Repository
Recording: https://www.youtube.com/watch?v=-tzCH9YuM6s

HashiTalks
March 2021 –  Virtual
Title: 
From Training to Serving: Machine Learning Models with Terraform
Technologies: AWS, Terraform, MLOps, NLP
Resources: Code Repository

Strata Data
September 2019 – New York, NY USA
Title: 
Protecting the Healthcare Enterprise from PHI Breaches using Streaming and NLP
Technologies: Apache Kafka, Apache Flink, NLP

Activate Search and AI Conference
September 2019 – Washington, DC, USA
Title: 
Leveraging Neural Networks and Learning-to-Rank in Document Workflows
Technologies: NLP (document classification), Learning-to-Rank
Recording: https://www.youtube.com/watch?v=vja4P55OSag

DataWorks Summit Washington DC
May 2019 – Washington, DC, USA
Title: 
Improving Organizational Knowledge with Natural Language Processing Enriched Data Pipelines
Technologies: Apache NiFi, Apache Kafka, Apache OpenNLP, Apache Superset
Resources: Code Repository

PyData Washington DC
November 2018 – McLean, Virginia, USA
Title: 
Using Sockeye Neural Machine Translation in a Streaming Pipeline
Technologies: Apache Flink, Sockeye
Recording: https://www.youtube.com/watch?v=Pzt4g5Z-FBI

Activate Search and AI Conference
October 2018 – Montreal, Quebec, Canada
Title: 
Embracing Diversity: Searching Over Multiple Languages
Technologies: Apache NiFi, Apache OpenNLP, Apache Solr, Sockeye
Recording: https://www.youtube.com/watch?v=ek-crQwMfnQ

Haystack Search Relevance Conference
April 2018 – Charlottesville, Virginia, USA
Title: 
Embracing Diversity: Implementing Multi-language Search
Technologies: Apache NiFi, Apache Joshua, Elasticsearch
Resources: Code RepositoryPresentation

Ping me below!

Email Me