This blog post shows how we can create an NLP pipeline to perform named-entity extraction on natural language text using the NLP Building Blocks and Apache NiFi. The NLP Building Blocks provide the ability to perform sentence extraction, string tokenization, and named-entity extraction. They are implemented as microservices and can be deployed almost anywhere, such […]
Author: jzemerick
Amazon EBS Elastic Volumes
On Feb 13, 2017, Amazon Web Services announced elastic EBS volumes! If you have used EC2 much you have undoubtedly been frustrated by the rigidness of EBS volumes. Once created they could not be modified or resized. If your EC2 instance required more disk space your only option was to manually create a new volume of […]
AWS EC2 Metadata Simulator
There is a new project on GitHub that is an EC2 metadata simulator. The project allows for testing applications that depend on EC2 instance metadata in non-AWS environments. It doesn’t (yet) provide complete simulation of all EC2 metadata endpoints but in time it will and in the mean time it should be simple enough to […]
AWS CloudFormation Supports YAML
In an exciting update from AWS, it was announced that CloudFormation now supports YAML in addition to JSON. I think most of us will agree this is great. The JSON templates worked, but whew, were they hard to read and the lack of the ability to add comments sometimes made my templates look more like sudokus […]
OpenNLP’s RegexNameFinder and Tokenizing
OpenNLP’s RegexNameFinder takes one or more regular expressions and uses those expressions to extract entities from the input text. This is very useful for instances in which you want to extract things that follow a set format, like phone numbers and email addresses. However, when tokenizing the input to the RegexNameFinder be careful because it can affect […]