Measuring the similarity between vectors or arrays is a calculation that we encounter often while developing recommendation systems or other machine learning models. This often involves performing similarity searches on entire datasets that can be computationally expensive. For systems that require such calculations to happen online in real-time, this can be a major issue and is something we often have to deal with at Leapfrog.
Luckily, we are not the only ones faced with this problem and there are open-source libraries like FAISS that have been developed to solve this exact problem by the Facebook AI Research team.
What is FAISS?
FAISS or Facebook AI Similarity Search is a library written in the C++ language with GPU support. It also has Python bindings so that it can be used with Numpy, Pandas, and other Python-based libraries. Its algorithmic enhancements that vastly narrow down the search space for a vector’s k-nearest neighbors allow it to have a much faster similarity search between vectors as compared to existing libraries like Scikit Learn. This technique is called Approximate Nearest Neighbours (ANN) search and sacrifices some precision to obtain the vast speedups.
Compared to other ANN libraries FAISS implements various vector compression, partitioning, and indexing techniques, especially by making use of the parallelism enabled by GPUs to make similarity search lookups more efficient. We will mostly be focusing on its indexing features and how that leads to fast similarity search in recommender systems. There are several other methods and optimizations in FAISS which can’t be covered by this blog alone. For a detailed overview of how its internal mechanism works, different and the previous work it builds upon please refer here.
Let’s analyze the performance for a recommendation task using FAISS vs Scikit-learn
To show the speed gains obtained from using FAISS, we did a comparison of bulk cosine similarity calculation between the FlatL2 and IVFFlat indexes in FAISS and the brute-force similarity search used by one of the most popular Python machine learning frameworks Scikit-learn.
A dataset of 20k movies was used for this comparison, with the similarity search performed on vectors obtained from the various genres of the movie. The task was to select the top 10 most similar movies for each of the 20k movies from a candidate pool ranging from 0 to 1000 movies.
As seen from the plots above, the time taken to perform the similarity search increases linearly for Scikit-learn to a few seconds, while that for the Flat index-based search is an order of magnitude faster.
So, how do FAISS’s indexing features lead to fast similarity search in recommender systems?
As stated above the main strength of FAISS is the speed with which it can perform similarity searches on billions of vectors at the cost of some precision. This is possible because of the implementation of indexes in FAISS that the whole package is optimized for. These indexes store a set of vectors and provide search functions in these sets with various vector comparison algorithms. One good way of understanding them is to think of them like the indexes used in databases to make queries faster.
We will now go through an example implementation of creating a FAISS index.
The simplest implementation of the index in FAISS is the IndexFlatL2 index. It is an exact search index that encodes the vectors into fixed-size codes. As the name suggests it is an index that compares the L2 (euclidean) distance between vectors and returns the top-k similar vectors. During the search, all the indexed vectors are decoded sequentially and compared to the vector whose nearest neighbors we are being calculated. This vector is also called the query vector.
This is different from how similarity search is done in libraries like Scikit-learn, as we have to choose the kind of similarity we are measuring and select the index accordingly. FAISS also optimizes how the index vectors are stored in memory or disk by using a tree data structure that hugely improves the search time.
The following code shows the process of defining the index vector size, initiating the IndexFlatL2 index, adding vectors to the index and saving the index into the disk.
We can reuse the saved index later for searching a vector’s nearest neighbors. The following code snippet shows how to load the index and perform a nearest neighbor search on it.
The output shows the resulting indexes and distances for the first five vectors of an index. In the second “distances” matrix we can see that the vectors have their nearest neighbors at the beginning of the array row, with a distance of 0 with itself and increasing in value as we move towards the end.
Another thing to remember is that in this case the index of the arrays is set automatically by FAISS in increasing order like the “auto_increment” column in SQL databases. It is advisable to use either a mapper with FAISS to real indexes or use a FAISS provided index setting which will come up further along.
Cosine Similarity Measurement
Although calculating Euclidean distance for vector similarity search is quite common, in many cases cosine similarity is preferred. In FAISS we don’t have a cosine similarity method but we do have indexes that calculate the inner or dot product between vectors. For example, the IndexFlatIP index. We can then take advantage of the fact that cosine similarity is simply the dot product between normalized vectors.
The code snippet below shows how this can be implemented. Partition-based Index
Another class of indexes in FAISS are partition-based indexes that speed up searches by partitioning the index into clusters and limiting the search to only a few clusters. This method however is not exact as there is no guarantee that the nearest neighbors will be in the clusters searched in.
An example of an index that uses partitioning techniques to make the search space a lot less and far more efficient is the IndexIVFFlat index.
The search operation can be carried out in the same way as earlier indexes. However, in the IVFFlat index, we define the “nprobe” hyperparameter to limit the search to only the defined number of clusters nearest to the query vector. This is also an example of how different indexes can be compounded to form a single index.
Principal Component Analysis (PCA)
We’ve looked at the use cases for some of the basic algorithms in FAISS. This section looks at a method called PCA. It is an algorithm that is popular in unsupervised machine learning that is used to reduce the vector dimensions using Principal Components of the vector space.
In FAISS, PCA is generally followed by indexes like IndexFlatL2 or IndexIVFFlat and they are linked with the help of the IndexPreTransform function. One requirement of this method is that the dimension of a vector needs to be a multiple of 4.
PCA allows us to reduce the dimensions but what if we want to increase the dimensionality of the vectors? We may encounter this scenario if we want to use an IndexIVFFlat index for instance. In this case, the dimensionality should ideally be a multiple of 4. FAISS allows us to do this through the RemapDimensionTransform method. As an example, let us suppose we have a vector of size 150 that we need to use with an IndexIVFFlat index. The way we transform this to a size that is a multiple of 4 (152) is given below.
This brings us to the end of Part 1 of the discussion on using the FAISS library from Facebook. We looked at the speedup it provides when compared to the similar methods used in Scikit-learn, and how FAISS optimizes this through various indexes. In Part 2 we will go beyond the basic indexing methods and look at more advanced versions. We will also look at the GPU support in FAISS, and the process to make the calculations even faster by leveraging them.
*Originally published by caboom.ai.
*Caboom is an internal project of Leapfrog.