A glimpse under the hood of Natural Language Processing

June 20, 2017

Natural Language Processing (NLP) refers to where linguistics and artificial intelligence meet and is a key component of Lytics’ Content Affinity Engine.

NLP models let our Content Affinity Engine know what different articles and content are about— information we can use to calculate content affinities for your users. The better we can describe content, the more accurate the affinities are.

So it’s pretty important that we can describe content in meaningful terms.

Topics, topics, topics

Think of how you might describe the following passage:

“The Seahawks blew a chance to make Super Bowl history with another improbable comeback because of an inexplicable decision to pass instead of handing the ball to Marshawn Lynch.”

You might describe it with the passage’s keywords, which have been emphasized in bold. Keywords are extracted verbatim from text, and keyword strategy was an important strategy in the earlier search engine optimization (SEO) driven days of the Internet. A keyword-based approach to content classification, however, leads to tunnel vision. Using only keywords, we don’t understand that the passage implicitly references The NFL, American Football, or even sports in general. These are examples of Topic Extraction.

NLP models are what allow our machines to understand and extract topics.

How our models learn

This specific type of NLP model called a topic model learns or is trained by using training data to help reinforce which topics are most relevant to specific passages of content.

To train a model to correctly classify sports articles, we might feed it the above passage about the Seahawks and further tell the model that the topics for that passage are The NFL, American Football, and Sports. Over time, as more and more content builds up a training data set, the model will start to make those topic classifications on its own.

Curating topics

Perhaps not surprising, sometimes models can be wrong. Perhaps a model is just wrong and used a reference to Dr. Peter Martens to return topics like Shoes, Boots, or Fashion, mistaking the doctor for the iconic clothing brand Doc Martens. Or perhaps, for a very niche topic, the model hasn’t seen enough training data to make a reliable classification, so it returns no topics.

Additionally, even if a model returns a topic that is correct, sometimes it’s not useful. In practicality, a topic’s specificity can determine its usefulness. On the Lytics blog, perhaps Data as a topic isn’t helpful, because most of the posts will be data related and having a topic that is tagged on every single post stops providing useful information.

In these cases, you might want to inform the model that these topics are not relevant for your brand. In that case, you can blacklist that topic inside of Lytics, which helps the model learn that the topic shouldn’t be applied to your data. Over time, this will help to increase the relevancy of the topic models under the Lytics hood, which can yield to increased relevancy and more targeted personalization in your own interactions with your users.

Interested in learning more?

We’d love to show you our Content Affinity Engine and answer any questions you might have. Schedule a demo today to learn more.

Author

Patrick Craig

Product

Solutions

Resources

A glimpse under the hood of Natural Language Processing

Topics, topics, topics

How our models learn

Curating topics

Interested in learning more?