The Science in Search: Integrating BERT

Article
Algorithm

At its core, search is about understanding language. 

Typing in disparate keywords was the norm 20 years ago, but now most people are inclined - in fact, trained by search engines like Google - to ask more complex questions via a search box or a voice assistant. Why? Because when we speak, we don’t naturally string a bunch of random words together. And computers have caught on.

Natural Language Processing (NLP) - the combination of AI and linguistics that helps give computer programs the ability to read, understand and derive meaning from human languages - has been studied for more than 50 years, but it’s modern advances in this area that have enabled search engines to become smarter over time. Because almost every other word in the English language has multiple meanings, search engines often struggle to interpret the intent expressed by users in both written and spoken queries.

Enter BERT.

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a breakthrough open-source machine learning framework for NLP developed by Google in 2019 and designed to better understand user searches. BERT allows the language model to determine word context based on surrounding words rather than just the word that immediately precedes or follows it.

Person, Place or Thing?

Since launching our site search product Yext Answers, we’ve been improving its search algorithm to help businesses provide more relevant results to customers on their own websites. Recently we launched the latest upgrade to our algorithm that leverages BERT technology to more accurately distinguish between locations and other entities. The primary reason is that location names (a place) are often the exact same as people (a person) or product names (a thing). For example, the following two queries both include the word “Orlando”:

In one, the user is clearly referring to the city of Orlando (a place), while in the other, the user is referring to someone named Orlando (a person). Classifying the first Orlando as a place and the second one as a name is called Named Entity Recognition (NER) - a process to locate and classify named entities mentioned in unstructured text into predefined categories.

It’s easy for you or I to distinguish the difference between the queries because we don’t look at Orlando in isolation, but rather in the context of the words surrounding it. In the first example, any word that follows “Bank near” is most likely going to be the name of a place. In the second example, Orlando right next to Bloom immediately signifies the well-known actor. This is where BERT is invaluable because it is designed to understand the contextual relationship between words in text. Previously, Yext Answers could occasionally deliver a location-based result for “Orlando” in the “Orlando Bloom” query. With this new approach, we can handle the differences.

A BERT in the Hand

At Yext, we’re building the Official Answers Engine - and leveraging BERT in our Answers algorithm is an important next step to enabling businesses to deliver the most accurate and official answers possible. One wrong answer can come with a huge opportunity cost, either in the form of lost business or a pricey call to customer service. By better pinpointing customer intent, more businesses can reduce that risk and deliver exceptional customer experiences on their own domains.

Want more like this?

Want more like this?

Insight delivered to your inbox

Keep up to date with our free email. Hand picked whitepapers and posts from our blog, as well as exclusive videos and webinar invitations keep our Users one step ahead.

By clicking 'SIGN UP', you agree to our Terms of Use and Privacy Policy

side image splash

By clicking 'SIGN UP', you agree to our Terms of Use and Privacy Policy