I love building stuff!
My name is Jeff and I am an independent consultant. I am a certified cloud architect, and my backround is in software engineering, DevOps, big-data, and natural language processing. I have earned many AWS and Google Cloud certifications. I am a software engineer at heart, but I enjoy everything from cloud architecture, to programming, and AI/ML.
I am available to hire for consulting! I can help with your cloud (AWS/Google Cloud) architecture, software development, and big-data/NLP projects. Please email me for a PDF resume.
I am proud to have a lot of AWS and Google Cloud certifications. You can see my AWS certifications transcript and my Google Cloud certifications transcript. I have been an AWS Community Builder in 2023, 2022, and 2021, for Machine Learning. My first AWS certifications were obtained in 2014. Back then the certifications had issue numbers!
I am an AWS SME (since February 2020) and an AWS Lead SME, which means I help contribute to the development of the AWS certifications exams. I have found there is no better way to learn because it requires in-depth knowledge of AWS!
I have worked in many industries from government, security, healthcare, logistics, and education. I first started with cloud when AWS first launched around 2008.
My interest in NLP was sparked in 2012 when I was working with unstructured data and learning about the difficulties in processing it. That led me to the Apache OpenNLP project. I started contributing to the Apache OpenNLP project where I could and today I am the current chair of the Apache OpenNLP PMC. I owe the Apache OpenNLP team a lot of gratitude for their help and guidance over the years. Most recently, I added ONNX Runtime support to Apache OpenNLP to facilitate the use of large-language models from Java. Check out the blog post I wrote!
As a programmer, I started with QBasic, QuickBasic, and Visual Basic for DOS. I picked up C++ through Visual Studio 6 before moving to .NET. I moved to Java around 2010 and have since predominantly used Java as my high-level language. I also develop in Python and Go, and have experience with Scala. I do a lot of Bash, CloudFormation, and Terraform scripting and use Linux almost exclusively. Ubuntu has been my first pick since it was versioned using single digits and mailed on CD-ROMs!
Today, I do consulting primarily in the areas of cloud, big-data, and NLP through my company Mountain Fog. I enjoy tackling problems around data ingestion, search, and AI/ML because those problems often fall directly in the intersections of my interests.
Some of my favorite tools are Apache Kafka and Apache NiFi, along with cloud services like Google Cloud Cloud Functions and AWS Lambda. Combining these applications and frameworks with the scalability of the cloud can create very powerful platforms for data ingestion and manipulation. I believe search is important (and often overlooked!) because without the ability to efficiently locate data, the data is almost worthless. Vector search is now adding a new dimension to search - pun totally intended.
I maintain the Phileas project, along with the Philter software, under Philterd, LLC. Redacting PII and PHI from text is very challenging but important and necessary.
I believe the future lies in distributed computing across disparate cloud platforms. The internet of things and AI/ML will continue to expand and both will bring lots of technical and ethical challenges. I hope I can find ways to contribute for the betterment of society.
My work:
Mountain Fog, Inc. offers consulting services in the areas of cloud, big-data, and NLP. Learn more at www.mtnfog.com.
Philterd offers AI-powered software to identify, redact, and manage sensitive information in text. Learn more at www.philterd.ai.
I believe industry certifications provide a method of encouragement and a valuable way for engineers to validate and showcase their experience and skills.
My AWS Certifications
My AWS Certifications transcript
I have also been an AWS Certification Lead SME and an AWS Certification SME where I participated in activities for AWS certification exam development.
My Google Cloud Certifications
Visit my Google Cloud transcript and my Google Developers profile.
Apache Community Over Code
October 2024 - Denver, Colorado
Title: Building an NLP Model Training Pipeline with Apache OpenNLP and Apache NiFi
Technologies: Apache OpenNLP, Apache NiFi, NLP pipelines
Apache Community Over Code
October 2023 - Halifax, Nova Scotia, Canada
Title: Apache OpenNLP and LLMs – Where does OpenNLP fit in?
Technologies: Apache OpenNLP, Large-language models, NLP model training and evaluation
Resources: Presentation slides
Linux Foundation Open Source Summit
May 2023 – Vancouver, Canada
Title: Using Apache OpenNLP with OpenSearch k-NN Vector Search
Technologies: Apache OpenNLP, ONNX Runtime, OpenSearch
Resources: Code Repository
ApacheCon
October 2022 – New Orleans, LA
Title: What’s New and Coming in Apache OpenNLP 2.0
Technologies: Apache OpenNLP, ONNX Runtime
Amazon Web Services OpenSearchCon
September 2022 – Seattle, WA
Title: Getting the most from your OpenSearch Contributions
Technologies: Open Source
Recording: https://www.youtube.com/watch?v=3j3IA546JQ8
Linux Foundation Open Source Summit
June 2022 – Austin, TX
Title: Searching for the right words: Bringing NLP Transformers to Apache Solr via Apache OpenNLP
Technologies: Apache OpenNLP, Apache Lucene, Apache Solr
Resources: Code Repository
Recording: https://www.youtube.com/watch?v=x5za13Jc5OY
Berlin Buzzwords
June 2021 – Virtual
Title: Applied MLOps to Maintain Model Freshness on Kubernetes
Technologies: NLP, Text classification, Kubernetes, MLOps
Resources: Code Repository
Recording: https://www.youtube.com/watch?v=-tzCH9YuM6s
HashiTalks
March 2021 – Virtual
Title: From Training to Serving: Machine Learning Models with Terraform
Technologies: AWS, Terraform, MLOps, NLP
Resources: Code Repository
Strata Data
September 2019 – New York, NY USA
Title: Protecting the Healthcare Enterprise from PHI Breaches using Streaming and NLP
Technologies: Apache Kafka, Apache Flink, NLP
Activate Search and AI Conference
September 2019 – Washington, DC, USA
Title: Leveraging Neural Networks and Learning-to-Rank in Document Workflows
Technologies: NLP (document classification), Learning-to-Rank
Recording: https://www.youtube.com/watch?v=vja4P55OSag
DataWorks Summit Washington DC
May 2019 – Washington, DC, USA
Title: Improving Organizational Knowledge with Natural Language Processing Enriched Data Pipelines
Technologies: Apache NiFi, Apache Kafka, Apache OpenNLP, Apache Superset
Resources: Code Repository
PyData Washington DC
November 2018 – McLean, Virginia, USA
Title: Using Sockeye Neural Machine Translation in a Streaming Pipeline
Technologies: Apache Flink, Sockeye
Recording: https://www.youtube.com/watch?v=Pzt4g5Z-FBI
Activate Search and AI Conference
October 2018 – Montreal, Quebec, Canada
Title: Embracing Diversity: Searching Over Multiple Languages
Technologies: Apache NiFi, Apache OpenNLP, Apache Solr, Sockeye
Recording: https://www.youtube.com/watch?v=ek-crQwMfnQ
Haystack Search Relevance Conference
April 2018 – Charlottesville, Virginia, USA
Title: Embracing Diversity: Implementing Multi-language Search
Technologies: Apache NiFi, Apache Joshua, Elasticsearch
Resources: Code Repository, Presentation
© 2023 Jeff Zemerick. All Rights Reserved.