Back to blog landing

June 20, 2017

A Glimpse Under the Hood of Natural Language Processing

Natural Language Processing (NLP) refers to where linguistics and artificial intelligence meet, and is a key component of Lytics’ Content Affinity Engine.

NLP models let our Content Affinity Engine know what different articles and content are about, which we can in turn use to calculate content affinities for your users. The better we can describe content, the more accurate the affinities are.

So it’s pretty important that we can describe content in meaningful terms.

Topics, Topics, Topics

Think of how you might describe the following passage:

“The Seahawks blew a chance to make Super Bowl history with another improbable comeback because of an inexplicable decision to pass instead of handing the ball to Marshawn Lynch.”

You might describe it with the passage’s keywords, which have been emphasized in bold. Keywords are extracted verbatim from text, and keyword strategy was an important strategy in the earlier search engine optimization (SEO) driven days of the Internet. A keyword-based approach to content classification, however, leads to tunnel vision. Using only keywords, we don’t understand that the passage implicitly references The NFL, American Football, or even sports in general — these are examples of Topic Extraction.

NLP models are what allow our machines to understand and extract topics.

How Our Models Learn

This specific type of NLP model called a topic model “learns”, or is “trained,” by using training data to help reinforce which topics are most relevant to specific passages of content.

To train a model to correctly classify sports articles, we might feed it the above passage about the Seahawks, and further tell the model that the topics for that passage are “The NFL,” “American Football,” and “Sports.” Over time, as more and more content builds up a training data set, the model will start to make those topic classifications on their own.

Curating Topics

Perhaps not surprising, sometimes models can be wrong. Perhaps a model is just wrong, and used a reference to Dr. Peter Martens to return topics like “Shoes,” “Boots,” or “Fashion,” mistaking the doctor for the iconic clothing brand Doc Martens. Or perhaps, for a very niche topic, the model hasn’t seen enough training data to make a reliable classification, so it returns no topics.

Additionally, even if a model returns a topic that is correct, sometimes it’s not useful — in practicality, a topic’s specificity can determine its usefulness. On the Lytics blog, perhaps “Data” as a topic isn’t helpful, because most of the posts will be data related, and having a topic that is tagged on every single post stops providing useful information.

In these cases, you might want to inform the model that these topics are not relevant for your brand. In that case, you can “blacklist” that topic inside of Lytics, which helps the model learn that topic shouldn’t be applied to your data. Over time, this will help to increase the relevancy of the topic models under the Lytics hood, which can yield to increased relevancy and more targeted personalization in your own interactions with your users.

Topic Blacklist

Based on your interests, you might also enjoy…