What is neural search?
Sometimes it's easier to learn about what something is, by learning what it isn't. In this article, we will explore the origins of search engines and discover how deep learning is being used to revolutionize the concept of search itself.
Search engines are like the gas stations of the digital world.
They provide a service no one else can, you'd be absolutely screwed without them, they work every-time, and while there is a sophisticated network spanning the globe working 24/7 to keep them operational, and even a Junior engineer like myself can operate one without breaking it.
“Search engines are like the gas stations of the digital world.”
What else could you possibly ask for?
Of course this capability to search billions of documents worth of information in nanoseconds did not fall out of a tree. After all, documents are just a long sequence of words, and computers don't understand the meaning of these words anyways.
But if they don't understand the meaning of the words, then how the heck are search engines able to find out if two documents are similar, just by looking at their words?
How the sausage is made.
In the early days of the internet, search engines literally used to work by having a massive "phonebook" that matched keywords to websites. So, by hand, a site like Yahoo would create some massive laundry list of websites each tagged with keywords. Then, if you searched "tuna", whatever sites had a "tuna" tag would be returned.

You can already see the myriad of problems with this approach. Like if the guy running the tuna site falls in the ocean. Until Yahoo updates the phonebook, wherever that may be, the next time you search tuna you're gonna get sent to a website that doesn't even exist. A fishy situation that gives "smelly code" a new meaning.
“Remember, computers see these billions of documents all over the internet as nothing more than a sequence of words.”
In reality, one could argue this actually isn't too bad a solution, like when there are only a couple thousand sites to catalog with tags. Nothing you couldn't pawn off to the intern for the weekend. But once you start talking about tens, or tens of thousands, or tens of millions of websites, it becomes clear pretty quick the "phonebook" style searching just will not cut it.
Remember, computers see these billions of documents all over the internet as nothing more than a sequence of words. Computers don't understand the meaning of the words in the document. We must provide the computer with some kind of system to determine what it should retrieve for the user.
Let's see how vector spaces can help us do that.
Progress: Vector Spaces
It's hard to appreciate the current state of search without understanding vector spaces.
The idea with vector spaces is that they allow us to represent real-world things like comics, mp3s, images, or text as a list of numbers. This concept is extremely important, because once we have represented these objects in a common form, a list of numbers, we can compare them "apples to apples" and quantify how close to each-other they are in vector spaces.
Let's see how this idea of comparing list of numbers in vector spaces to determine similarity can be leveraged by a search engine to assess the relevance of documents.
How search really works.
In this example we will use some very simple maths to compare documents based on word frequency. It will give us an intuition behind how a computer transforms a website from a list of words to a list of numbers so it can compare them and find relevant results. Later on, we will see that we can use neural networks to "determine" these list of numbers instead of the method outlined below.
It goes like this.
The process starts by counting the frequency of each word in a document collection. This is a way of abstractly representing the documents, transforming them from a list of words to list of numbers. It's an important step to understand, as will see later that having our search term and documents all represented as a list of numbers makes them comparable "apples to apples".

After we are finished with counting, we create our index, which is a list of all the words of our vocabulary. We also remember which documents contained those words. As you can imagine, in the real world our index would be absolutely massive.

Then, after we have indexed our documents, all we have to do is take our search query and transform it from a list of words to a list of numbers like we did with the documents before.

Now we can use our index to look up the documents that contain the word "apple". For example, Document 3 did not mention apples, so it's not relevant to our search.

One of the nice things about vectors is that you can draw them. The query (x=1, y=0), Document 1 (x=4, y=1), and Document 2 (x=1, y=3) can all be plotted. You can see below that the angle between the query and Document 1 is smaller than the angle between the query and Document 2, meaning Document 1 is more relevant. By comparing all angles between the query and every document in the collection, a search engine decides which documents are most relevant.

Last thing I wanted to address before we move on because this is really cool. What happens if you have to add another word to the index besides apples and bananas? All we have to do is add one more axis. We could have one axis for each word we could ever thing of, but our previous vectors remain the same. Adding another dimension is just a simple inconvenience for our computer requiring it to do a few more computations before we get the results.

And there you have it, by creating an index and turning our document into vectors, we are able to find similar documents by just looking at their angles.
I'm showing you this example because I think it does a good job laying the groundwork for understanding the power of representing documents as a list of numbers. Thats important, because as we will see later instead of counting the words, we can use deep learning to "decide" in a much more intelligent way how we should represent our documents as a list of numbers. This will make our searches qualitatively different in what they allow us to do.
You've done the hard part and understood how a list of numbers can be represented in vector spaces and their angles compared for similarity. Let's dig deeper and determine if "deep learning determined" list of numbers are really all that big of a deal.
A new way to search is here
We understand a little bit better now how search engines work and how we can use something like word frequency to transform our list of documents to a list of numbers.
But what if we could determine what those numbers are in a "smarter" way instead of just counting? Might a "smarter" way to determine how to represent each document as a list of numbers give us better results when we compare them in vector spaces?
As we will soon see, it absolutely will.
Semantic Search is the real deal.
The quality of results returned by searches using traditional lexical approaches vs deep learning powered semantic searches is apples-to-orange. Let me tease you with a little example from Wikipedia and show you what neural search can do.
When you search on Wikipedia, "What is the capital of the United States?", guess what you get?
Some garbage about the death penalty and a couple random states with a traditional "word counting" type lexical approach. In fairness, Wikipedia uses a more advanced method than our "word counting" method, but it's the same basic idea and it performs pretty bad.

In the slide above from Nils Reimers excellent webinar "Search Like You Mean It," you can see that as opposed to the garbage returned from lexical search, with semantic search we get a perfect hit. Needless to say, we need to figure out what this semantic search is all about. How do we get results like the right side of the slide above?
The answer is pre-trained transformers which allow us to power our searches with deep learning. And while the use of embeddings to capture semantic information is not a new concept in ML, language models like BERT and the advancements in computing power certainly are. The result is that they are reimagining what it means to search.

See how vector spaces are used above to determine similarity based on a list of numbers and similarity metric, giving us back those awesome results that tell us correctly "Washington D.C." when we ask it what the Capitol of the United States is? Pretty cool, huh?
And don't take my word for it if you think this is all a bunch of hype. The dominance of these pre-trained transformers is undeniable, making vector spaces and their application to deep learning powered information retrieval all the more exciting to learn about today.

One last thing.
Before we go, I'd like to bring to your attention one last fascinating aspect of vector spaces.
Vector spaces are not just limited to text. We can include images in vector spaces as well as text. Vector space does not care about type of data.

We can see in the figure above that regardless of the type of data, by creating an index and turning our document into vectors, even if they are different types of data, we are able to find similar documents by just looking at their angles. Of course this assumes using cosine similarity as a "similarity metric", but thats another topic for another day.
Indeed, multi-modal searching, or the ability to search one type of data with another type of data, is one of the most exciting things about neural search.
Summary
In this article, we learned about neural search by learning what it isn't. We explored the checkered history of search as a concept and obtained a better understanding of how search engines have evolved to meet the ever increasing demands of a user in the digital age.