Search tattoo images by text: An open-source project made with Jina
Jina is a neural search framework that empowers anyone to build state-of-the-art, mind-blowing neural search applications in minutes. This tutorial will create a "deep-learning powered" search system that retrieves images based on short text queries. We will search based on the meaning, not just the raw keywords. For example, tattoo images that contain the "concept" of a text query like "mario brothers." All of this is without any manual dataset labeling or custom object training. Enter stage right, Jina and CLIP.
It may not be obvious at first why being able to retrieve images based on short text queries is cool at all. Let's consider a simple example to appreciate how cool what we are building actually is, and how neural search differs from your Grandpa's old school text searches.
The traditional way of search
Imaging you have a folder on your desktop with a bunch of images in them. You want to get the images with cats in them. How would you do this?
“Imagine you have a folder on your desktop with a bunch of images in them. You want to get the images with cats in them. How would you do this?”
If your approach was to search "cats" with a regular text-based search on your computer, of course you'd expect to get images with cats in the filename. While simple, the problem with this "manual labeling" approach is that it is extremely labor intensive and error prone.
Think of how many hours it would take you to change the filenames, or describe the contents, "by hand," of even a measly 1000 pictures?
It is limiting, indeed.
The new way of search
But what if you could get the images containing cats WITHOUT labeling them as cat images and WITHOUT training some custom object detector for cats?
How cool would it be if we could represent both images and query text in the same embedding space, allowing an "apples to apples" comparison between images and text data even though images and text are completely different types of data?
Among a million other things, Jina lets you do that.
Let's take a look at what's required for a production ready neural search system built with Jina, and see how you can use it to bridge the gap between your research and business needs.
How does it work?
Okay, so we understand our problem is somehow figuring out a way to compare images and text "directly". We want to be able to write the word "cat" as a query, and then retrieve pictures with a cat solely by using the embeddings similarity.
Making this possible requires two critical parts: an 1) index flow and 2) query flow.
- Index Flow: for breaking down and extracting rich meaning from your dataset using neural network models
- Query/Search Flow: for taking a user input and finding matching results
Think of what peanut butter and jelly are to a sandwich, and thats what an index flow and query flow are to a Jina project. You need both, or nothing makes sense.
Let's break down each of the two steps and start building our awesome Jina app. Follow me.
Part 1 of 2: Defining the Indexing Process
We learned in the first section that all Jina applications require two critical components: 1) indexed data and 2) a way to search our indexed data.
Indexing and searching. Peanut butter and jelly, baby.
The indexing process, the first of our two requirements and depicted below, is responsible for taking our input pictures, passing them through a neural network to extract features, and then storing/indexing the extracted features in a database.
You can almost think of it as passing your images through a "cheese grater," where the shredded cheese on the other side is stored in the database.
To implement this workflow in Jina, we need to compose our first `Flow`.
Flows are like the "grand puppet masters" of the Jina ecosystem. Flows are what orchestrate our Executors, which perform the heavy-lifting operations on DocumentArrays at lightning speed.
We will call this first flow the indexing flow, and it is responsible for encoding and indexing the images to the database.
“Remember, later on we want to search these images, the "cheese" we just put in the database, by text. So it's important we find a way to represent image and text data in a way that makes them comparable.”
If all that jargon sounds confusing, I hear you.
All you need to remember is that our index flow will allow us to encode, or "cheese grater," our images and store them in a database.
Remember, later on we want to search these images, the "cheese" we just put in the database, by text. So it's important we find a way to represent image and text data in a way that makes them comparable.
We will see in the search flow stage that we will do the same thing with our query text. We will put a query text like "mario" through the "cheese grater," and then compare the grated cheese of the text with the grated cheese from the images we already have encoded and indexed into our database.
Now, let's actually index our data. Checkout the notebook here.
Yahtzee! We have successfully indexed our data! Checkout the notebook here.
While it is out of the scope of this article to discuss different indexers and the pros/cons of each, note here that we will be using the SimpleIndexer.
“ Subclassed Executors inheriting from the Executor base class are bound to our flow with the '@requests' decorator, and operate on our DocumentArrays in the ways we define within our "/index" and "/search" endpoint implementations.”
One more housekeeping note before we move onto the searching stage.
I just want to point out that Executors like the CLIPEncoder are just python classes inheriting from an Executor base class. Subclassed Executors inheriting from the Executor base class are bound to our Flow with the '@requests' decorator, and operate on our DocumentArrays in the ways we define within our "/index" and "/search" endpoint implementations.
Now that we have our images indexed, lets move on to figuring out a way to search them via text!
Part 2 of 2: Defining the Search Flow
The second required component of our Jina project is the search flow. Here is where we are able to take our text query "mario," embed that text into the same embedding space ("type of cheese") as the images we indexed, then return to the user the most similar images to the text query.
To implement this workflow in Jina, we need to compose our second `Flow`.
The code for the search flow is very similar to that of the index flow, except this time we are going to be putting text through our "cheese grater," not images. We need to first encode the text so that we can compare it "apples to apples"/"cheese to cheese" with the images we already encoded and indexed into our database.
Before we hit blastoff, all we have to do is determine what we mean by "similarity" when we compare our query embeddings to our indexed embeddings.
For this project we will use the cosine distance to determine which image embeddings are most similar to our query text embeddings, then display the images associated with the lowest scores.
Now, without further ado, let's actually search our data.
Voila! You have successfully queried your indexed data! Don't forget you can play with the notebook and create your own here! Source code for original project built with Streamlit is here.
In this tutorial, we created a "deep-learning powered" neural search system that retrieves images based on short text queries. Our neural search system powered by Jina allows us to search based on the meaning, not just the raw keywords, retrieving tattoo images that contain the "concept" of a text query like "mario brothers." We did the entire thing without ANY manual dataset labeling or custom object training.
Hate to be "that guy," but I told ya so. Jina is awesome.