I am very excited to be presenting at this year’s Linux Foundation Open Source Summit North America! My presentation titled Searching for the Right Words: Bringing NLP to Apache Solr through ONNX and OpenNLP will talk about the new ONNX capabilities that have been introduced into Apache OpenNLP and how to utilize the functionality from […]
Category: natural-language-processing
Apache OpenNLP and ONNX Models
I got started using the Apache OpenNLP project some time around 2009. I had a large amount of unstructured text that I wanted to process and I didn’t know how. As a Java programmer, Apache OpenNLP provided the tools I needed to make that large amount of text usable. Back then, the NLP options available were […]
Dataworks Summit 2019
Recently (in May 2019) I had the honor of attending and speaking at the Dataworks Summit in Washington D.C. The conference had many interesting topics and keynote speakers focused on big-data technologies and business applications. I also always enjoy exploring downtown Washington DC. Whether it is doing the “hike” across the National Mall taking in […]
Creating an N-gram Language Model
A statistical language model is a probability distribution over sequences of words. (source) We can build a language model using n-grams and query it to determine the probability of an arbitrary sentence (a sequence of words) belonging to that language. Language modeling has uses in various NLP applications such as statistical machine translation and speech […]
PyData Washington DC 2018
Last month in November 2018 I had the privilege of attending and presenting at PyData Washington DC 2018 at Capital One. It was my first PyData event and I learned so much from it that I hope to attend many more in the future and I encourage you to do so, too, if you’re interested […]
Lucidworks Activate Search and AI Conference
Back in October 2018 I had the privilege of attending and presenting at Lucidworks Activate Search and AI Conference in Montreal, Canada. It was a first-class event with lots of great, informative sessions set in the middle of a remarkable city. I was a co-presenter of Embracing Diversity: Searching over multiple languages with Suneel Marthi in […]
Apache OpenNLP Language Detection in Apache NiFi
When making an NLP pipeline in Apache NiFi it can be a requirement to route the text through the pipeline based on the language of the text. But how do we get the language of the text inside our pipeline? This blog post introduces a processor for Apache NiFi that utilizes Apache OpenNLP’s language detection […]
NLP Pipeline using Apache NiFi and NLP Building Blocks
This blog post shows how we can create an NLP pipeline to perform named-entity extraction on natural language text using the NLP Building Blocks and Apache NiFi. The NLP Building Blocks provide the ability to perform sentence extraction, string tokenization, and named-entity extraction. They are implemented as microservices and can be deployed almost anywhere, such […]
OpenNLP’s RegexNameFinder and Tokenizing
OpenNLP’s RegexNameFinder takes one or more regular expressions and uses those expressions to extract entities from the input text. This is very useful for instances in which you want to extract things that follow a set format, like phone numbers and email addresses. However, when tokenizing the input to the RegexNameFinder be careful because it can affect […]