fbpx Using FAISS Advanced Concepts to Speed Up Similarity Search in Recommender Systems

Using FAISS Advanced Concepts to Speed Up Similarity Search in Recommender Systems (Part II)

Continued from the last blog.

Improving on Flat Indexes

Beyond the flat indexes that perform exhaustive searches, FAISS also has methods that compress the vectors to decrease their memory footprint. To accomplish this, FAISS has very efficient implementations of a few basic components like K-meansPCA, and Product Quantizer encoding decoding. We can use these components solely for the functions they provide, but they are usually used in conjunction with other methods.

We’ve already seen how PCA can be used in Part 1 and here we will look at indexing based on the Product Quantization (PQ) vector compression algorithm. These indexes do not use tree-based indexes, but they achieve the speeds in distance calculations by approximating and largely simplifying the distance calculations.

Partition-based Quantization Index

Essentially what the PQ index does is compress the vectors by partitioning the vector into smaller subvectors, perform k-means clustering, and use the centroids of these clusters to represent the vectors. Direct distance calculation on the compressed vectors is not feasible, so similarity search is still going to be exhaustive and needs decompressing each vector we’re comparing the query vector with. The algorithm however does this in a computationally efficient way through lookup tables.

A comparison of these Product Quantization based indexes to flat indexes can be found on this Github page.

FAISS provides the IndexIVFPQ index that uses pre-filtering by partitioning the vectors into clusters as described for IVFFlat index. This with other enhancements described in this paper gives it further speed up.

One requirement for this method is that the compressed subvector size should be a multiple of the original vector size (preferably a multiple of 4). Let’s see this in code.

Here, we are not reducing the dimension of the data but the bit of each float carried by the subvector.

So, if we initially have 1024 dimensional 32-bit float vectors, and we divide it into 8 subvectors each of length 128. Using k-means clustering with k = 256 allows us to use only 8 bit to store a centroid id for each subvector.

Hence the compression factor is: (1024 * 32) bits / (8 * 8) bits  = 512.

ID Mapping

In practice, there’s usually a unique identifier associated with each element that a vector represents such as a user or an item on an e-commerce website. So, storing the vectors with an id that is used by the system (a recommender system database for example) will prevent the need for an extra mapper to convert from one auto-generated by FAISS.

This can be done by adding an extra index in the pipeline named IndexIDMaps. Let’s see how you can do this here.

Note: The IndexIVFFlat subclass always stores the vector ids so it will be redundant using IndexIDMaps with this subclass. We can directly use add_with_ids with it.

Binary indexes

Faiss also supports binary vectors where the only possible value in each cell is a 0 or 1 value through binary indexes. These indexes store the vectors as arrays of bytes so that a vector of size d takes only d / 8 bytes in memory. As a result, the vector sizes have to be multiples of 8 and rounding the sizes may be required. These indexes support Hamming distance as a measure of similarity.


The most basic type of binary index is the IndexBinaryFlat index. Like the floating-point flat indexes, it performs an exhaustive search for nearest neighbors. As stated earlier the Hamming distance is used for similarity measurement.


The “IVF” (Inverse Vector File) flavor IndexBinaryIVF speeds up the search by clustering the vectors. This is equivalent to the IndexIVFFlat of the floating-point indexes.

Using the GPU

FAISS has GPU support for Nvidia’s CUDA. The reason for wanting to use GPUs is that they provide a significant speedup over equivalent CPU implementation. This is specifically true for algorithms that support parallelization for large datasets.

The name of the index methods are generally preceded by Gpu from the corresponding CPU-based index. For example, IndexFlatL2 would be replaced by GpuIndexFlatL2.

There are however some limitations to the GPU indexes as compared to the CPU versions, such as less memory, storing indices corresponding to inverted files, and the overhead of transferring data to the GPU.

So, it is advised to keep both the data and index in either GPU or CPU as copying data from CPU to GPU can be expensive. The StandardGpuResources() is used to build indexes on GPU and can be used to reserve the resources needed explicitly.

Converting indexes from/to GPU and CPU is enabled with index_gpu_to_cpuindex_cpu_to_gpu and index_cpu_to_gpu_multiple methods.

The code below shows how GPU indexes can be created from a CPU index.

Brute force search without the index

Sometimes we may be in a situation where any optimization or reduction in the size of vectors may not add any value in our use case. For example, if we are just using a Flat index we’re just copying the vectors in the index as it is, and need twice the memory size. In these cases, it may be better to just use a brute force search using low-level FAISS functions.

An example of this is given below. Here xq is the array of vectors to search for the nearest k neighbors from and xb the query vectors we want to find nearest neighbors for.


Now that you’ve gone through the basics of how FAISS works and looked at some of its advanced features, we believe you will be in a better position to use it effectively for your use-cases. To learn about FAISS in more detail we recommend that you go through all the available index types and study how they relate to  tradeoffs between three factors:

  • Speed,
  • Accuracy, and
  • Memory

For example, flat indexes are slower because they perform exhaustive searches, but are the most accurate, whereas HNSFW indexes are fast and accurate but take up a lot of memory.

We can use methods like PCA to keep the memory footprint down but with some cost of accuracy. Product quantizers (PQ) are quite accurate linear compressors but it is linear and might not work well if we have to reduce the vector dimension considerably.

A list of all of the indexes available in FAISS can be found here.


Guidelines for choosing an index can be found here.


*Originally published by caboom.ai.

AI Team

This blog was written by the AI team at Leapfrog.

More in Blogs

Customer Segmentation Using RFM Analysis Artificial IntelligenceInsights

Customer Segmentation Using RFM Analysis

We live in a dynamic world, and people are seemingly looking for growth, and many depend on establishing and running

Read more
Using FAISS basics to Speed up similarity search in Recommender Systems (Part I) Artificial Intelligence

Using FAISS basics to Speed up similarity search in Recommender Systems (Part I)

Measuring the similarity between vectors or arrays is a calculation that we encounter often while developing recommendation systems or other

Read more
Why is a No-Code Recommendation System worth a try? Artificial IntelligenceInsights

Why is a No-Code Recommendation System worth a try?

Lately, in the tech world, no-code systems have been getting immense attention, and for good reasons. These types of systems

Read more