Author: jzemerick

I am a 2021 AWS Community Builder for Machine Learning

I was selected as a 2021 AWS Community Builder for machine learning! The AWS Community Builders program selects participants to help share AWS knowledge and resources to the community through engagements such as blog posts, code, and videos. I was selected for machine learning so you can expect to see some upcoming machine learning on […]

The Fallacy of Avoiding Cloud Vendor Lock-In

I have worked with many companies to help them either migrate to the cloud or develop new cloud applications for over 10 years. A very common requirement is that the designed architecture avoid using any cloud vendor specific technologies or services. The rationale is usually that although we are running our application on vendor X […]

Querqy Chorus

For the past couple of months I have attended occasional presentations about Chrous, an open source stack for search, created by Querqy. The presentations have focused on the stack components of Apache Solr, SMUI (Search Management UI), the search relevancy tool Quepid, among others. There is a decent amount of search-related open source projects out […]

Some First Steps for a New NiFi Cluster

After installing Apache NiFi there are a few steps you might want to take before making your cluster available for prime time. None of these steps are required so make sure they are appropriate for your use-case before implementing them. Lowering NiFi’s Log File Retention Properties By default, Apache NiFi’s nifi-app.log files are capped at […]

A Tool for Every Data Engineer’s Toolbox

Collecting data from edge devices in manufacturing, processing medical records from electronic health systems, and analyzing text all sound like very different problems each requiring unique solutions. While that certainly is true there are some commonalities between each of these tasks. Each task requires a scalable method of data ingestion, predictable performance, and capabilities for […]

Monitoring Apache NiFi’s Log with AWS CloudWatch

It’s inevitable that at some point while running Apache NiFi on a single node or as a cluster you will want to see what’s in NiFi’s log and maybe even be alerted when certain logged events are found. Maybe you are debugging your own processor or just looking for more insight into your data flow. […]

Monitoring Apache NiFi with Datadog

One of the most common requirements when using Apache NiFi is a means to adequately monitor the NiFi cluster. Insights into a NiFi cluster’s use of memory, disk space, CPU, and NiFi-level metrics are crucial to operating and optimizing data flows. NiFi’s Reporting Tasks provide the capability to publish metrics to external services. Datadog is […]

Apache NiFi’s MergeContent Processor

The MergeContent processor in Apache NiFi is one of the most useful processors but can also be one of the biggest sources of confusion. The processor (you guessed it!) merges flowfiles together based on a merge strategy. The processor’s purpose is straightforward but its properties can be tricky. In this post we describe how it […]

Dataworks Summit 2019

Recently (in May 2019) I had the honor of attending and speaking at the Dataworks Summit in Washington D.C. The conference had many interesting topics and keynote speakers focused on big-data technologies and business applications. I also always enjoy exploring downtown Washington DC.  Whether it is doing the “hike” across the National Mall taking in […]