Tag: nlp

HashiTalks 2021

For HashiTalks 2021 I presented a Terraform project that managed the training and serving of NLP models. Built in AWS and using ECS, S3, DynamoDB, SQS, Lambda, and EventBridge, the project provides a way to do automated containerized NLP model training. You can queue a model for training by describing the model you want to […]

Dataworks Summit 2019

Recently (in May 2019) I had the honor of attending and speaking at the Dataworks Summit in Washington D.C. The conference had many interesting topics and keynote speakers focused on big-data technologies and business applications. I also always enjoy exploring downtown Washington DC.  Whether it is doing the “hike” across the National Mall taking in […]

Creating an N-gram Language Model

A statistical language model is a probability distribution over sequences of words. (source) We can build a language model using n-grams and query it to determine the probability of an arbitrary sentence (a sequence of words) belonging to that language. Language modeling has uses in various NLP applications such as statistical machine translation and speech […]

Lucidworks Activate Search and AI Conference

Back in October 2018 I had the privilege of attending and presenting at Lucidworks Activate Search and AI Conference in Montreal, Canada. It was a first-class event with lots of great, informative sessions set in the middle of a remarkable city. I was a co-presenter of Embracing Diversity: Searching over multiple languages with Suneel Marthi in […]

OpenNLP’s RegexNameFinder and Tokenizing

OpenNLP’s RegexNameFinder takes one or more regular expressions and uses those expressions to extract entities from the input text. This is very useful for instances in which you want to extract things that follow a set format, like phone numbers and email addresses. However, when tokenizing the input to the RegexNameFinder be careful because it can affect […]