Like this demo? Subscribe for more →

Semantic Cache Hits

An interactive clustering visualization of a semantic cache.

Scroll below for the legend and an explanation of the visualization and hover/touch the dots for more info.

One sec. Computing semantic clusters using UMAP...


Represents a potential prompt request
Represents a cached prompt request
Connects a potential prompt to a cached prompt if the potential prompt is sufficiently semantically similar to the cached prompt. AKA a cache hit.


The visualizaton uses a clustering algorithm called UMAP to visualize clusters of semantically similar prompt requests. Each prompt request is represented by a single dot and is meant to represent a hypothetical request to a generic AI. White dots denote prompt requests that were previously cached, and blue dots denote prompt requests that might occur in the future. The green line between a blue and white dot represents a cache hit where the new prompt request (blue dot) would use a cached response from a previous prompt request (white dot). A cache hit implies that the previously cached prompt request is semantically similar to the new prompt request.

What is a similarity threshold? A similarity threshold is a number between 0-1 that represents the minimum required similarity between a new prompt request and a cached prompt request to result in a cache hit. As the similarity threshold goes up, expect fewer connecting lines between the blue and white dots, and when it goes down, expect more.

How is semantic similarity calculated? The "similarity" between any two prompt requests refers to the cosine similarity of their generated vector embeddings. Vector embeddings encode semantic information within a high dimensional numerical vector array, and we can compare numerical vector arrays using cosine similarity. I wrote about this here. Vector embeddings for the prompt requests in the visualization were generated using one of OpenAI's latest embedding model, text-embedding-3-large.

Line length doesn't mean much. The distribution of individual dots (or prompt requests) is only meant to visualize clusterings of semantically similar prompt requests. But the exact distance between any two specific prompt requests in the visualization below is not a precise measurement of semantic similarity. It's impossible to accurately represent the global structure of a higher dimensional space in a 2D space. For more info about the clustering algo, UMAP, read this in-depth article. (I'm not the author)

Read more about vector embeddings and how they're used in AI

Find me on Twitter