❮   all STORIES

Search tattoo images by text: An open-source project made with Jina

Jina is a neural search framework that empowers anyone to build state-of-the-art, mind-blowing neural search applications in minutes. This tutorial will create a "deep-learning powered" search system that retrieves images based on short text queries. We will search based on the meaning, not just the raw keywords. For example, tattoo images that contain the "concept" of a text query like "mario brothers." All of this is without any manual dataset labeling or custom object training. Enter stage right, Jina and CLIP.

Figure 1: With Jina, you can build state-of-the-art search applications in minutes.

It may not be obvious at first why being able to retrieve images based on short text queries is cool at all. Let's consider a simple example to appreciate how cool what we are building actually is, and how neural search differs from your Grandpa's old school text searches.

The traditional way of search

Imaging you have a folder on your desktop with a bunch of images in them. You want to get the images with cats in them. How would you do this?

“Imagine you have a folder on your desktop with a bunch of images in them. You want to get the images with cats in them. How would you do this?”

If your approach was to search "cats" with a regular text-based search on your computer, of course you'd expect to get images with cats in the filename. While simple, the problem with this "manual labeling" approach is that it is extremely labor intensive and error prone.

Think of how many hours it would take you to change the filenames, or describe the contents, "by hand," of even a measly 1000 pictures?

It is limiting, indeed.

The new way of search

But what if you could get the images containing cats WITHOUT labeling them as cat images and WITHOUT training some custom object detector for cats?

How cool would it be if we could represent both images and query text in the same embedding space, allowing an "apples to apples" comparison between images and text data even though images and text are completely different types of data?

Among a million other things, Jina lets you do that.

Let's take a look at what's required for a production ready neural search system built with Jina, and see how you can use it to bridge the gap between your research and business needs.

How does it work?

Okay, so we understand our problem is somehow figuring out a way to compare images and text "directly". We want to be able to write the word "cat" as a query, and then retrieve pictures with a cat solely by using the embeddings similarity.

Making this possible requires two critical parts: an 1) index flow and 2) query flow.

  1. Index Flow: for breaking down and extracting rich meaning from your dataset using neural network models
  2. Query/Search Flow: for taking a user input and finding matching results
Figure 2: Schematic overview demonstrating that index flow and search flow are used in a Jina neural search app. Source

Think of what peanut butter and jelly are to a sandwich, and thats what an index flow and query flow are to a Jina project. You need both, or nothing makes sense.

Let's break down each of the two steps and start building our awesome Jina app. Follow me.

Part 1 of 2: Defining the Indexing Process

We learned in the first section that all Jina applications require two critical components: 1) indexed data and 2) a way to search our indexed data.

Indexing and searching. Peanut butter and jelly, baby.

The indexing process, the first of our two requirements and depicted below, is responsible for taking our input pictures, passing them through a neural network to extract features, and then storing/indexing the extracted features in a database.

You can almost think of it as passing your images through a "cheese grater," where the shredded cheese on the other side is stored in the database.

Figure 3: A flowchart representing the process of extracting features from each image in the dataset. Source.

To implement this workflow in Jina, we need to compose our first `Flow`.

Flows are like the "grand puppet masters" of the Jina ecosystem. Flows are what orchestrate our Executors, which perform the heavy-lifting operations on DocumentArrays at lightning speed.

We will call this first flow the indexing flow, and it is responsible for encoding and indexing the images to the database.

“Remember, later on we want to search these images, the "cheese" we just put in the database, by text. So it's important we find a way to represent image and text data in a way that makes them comparable.”

If all that jargon sounds confusing, I hear you.

All you need to remember is that our index flow will allow us to encode, or "cheese grater," our images and store them in a database.

Remember, later on we want to search these images, the "cheese" we just put in the database, by text. So it's important we find a way to represent image and text data in a way that makes them comparable.

We will see in the search flow stage that we will do the same thing with our query text. We will put a query text like "mario" through the "cheese grater," and then compare the grated cheese of the text with the grated cheese from the images we already have encoded and indexed into our database.

Now, let's actually index our data. Checkout the notebook here.

Figure 4: Defining our index Flow.
Figure 5: Index flow context manager
Figure 6: Booyah! You're indexing your data!

Yahtzee! We have successfully indexed our data! Checkout the notebook here.

While it is out of the scope of this article to discuss different indexers and the pros/cons of each, note here that we will be using the SimpleIndexer.

“ Subclassed Executors inheriting from the Executor base class are bound to our flow with the '@requests' decorator, and operate on our DocumentArrays in the ways we define within our "/index" and "/search" endpoint implementations.”

One more housekeeping note before we move onto the searching stage.

I just want to point out that Executors like the CLIPEncoder are just python classes inheriting from an Executor base class. Subclassed Executors inheriting from the Executor base class are bound to our Flow with the '@requests' decorator, and operate on our DocumentArrays in the ways we define within our "/index" and "/search" endpoint implementations.

Figure 7: Executors are python classes. See notebook here for full code.

Now that we have our images indexed, lets move on to figuring out a way to search them via text!

Part 2 of 2: Defining the Search Flow

The second required component of our Jina project is the search flow. Here is where we are able to take our text query "mario," embed that text into the same embedding space ("type of cheese") as the images we indexed, then return to the user the most similar images to the text query.

Figure 8: Performing a search on an information retrieval system. A user submits a query; the query image is described; the query features are compared to existing features in the database; results are sorted by relevancy and then presented to the user. Adapted from source.

To implement this workflow in Jina, we need to compose our second `Flow`.

The code for the search flow is very similar to that of the index flow, except this time we are going to be putting text through our "cheese grater," not images. We need to first encode the text so that we can compare it "apples to apples"/"cheese to cheese" with the images we already encoded and indexed into our database.

Before we hit blastoff, all we have to do is determine what we mean by "similarity" when we compare our query embeddings to our indexed embeddings.

For this project we will use the cosine distance to determine which image embeddings are most similar to our query text embeddings, then display the images associated with the lowest scores.

Now, without further ado, let's actually search our data.

Figure 9: Defining the Executors to be used by our search flow to embed our query text and then compare it to our indexed data.
Figure 10: Search flow schematic
Figure 11: Opening context and performing a search.
Figure 12: Returning closest matches

Voila! You have successfully queried your indexed data! Don't forget you can play with the notebook and create your own here! Source code for original project built with Streamlit is here.

Conclusion

In this tutorial, we created a "deep-learning powered" neural search system that retrieves images based on short text queries. Our neural search system powered by Jina allows us to search based on the meaning, not just the raw keywords, retrieving tattoo images that contain the "concept" of a text query like "mario brothers." We did the entire thing without ANY manual dataset labeling or custom object training.

Hate to be "that guy," but I told ya so. Jina is awesome.

About the Author

Kevin Zehnder

Skateboarding is not a crime. lolz

Get the latest articles in your inbox

Awesome sauce!

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form :(

Suggested Stories

j.jpeg

What is neural search?

Lets explore neural search by taking a trip down memory lane.