search

Found

info Overview

Compute cosine similarity, angle and distance between two numeric vectors in comma, space, or JSON format. Handles up to 4096 dims and shows norms.

📘 How to Use

  1. Pick an input format (comma-separated, space-separated, or JSON array)
  2. Paste the numbers for Vector A and Vector B
  3. Read the cosine similarity, angle, and cosine distance with the interpretation label

Embedding Vector Cosine Similarity Calculator

Input format:
0 dims ‖A‖ = 0
0 dims ‖B‖ = 0

Cosine Similarity

0.0000

-1 (opposite) to 1 (identical)

Angle

0.00 °

0° to 180°

Cosine Distance

0.0000

0 (identical) to 2 (opposite)

Interpretation

Enter both vectors above.

※ Formula: cos(θ) = (A · B) / (‖A‖ × ‖B‖). Cosine distance is 1 − cos(θ). Cosine similarity stays in [-1, 1] regardless of dimensionality; for text embeddings, 0.7-0.95 is the typical 'similar' range.

Article

Embedding Vector Cosine Similarity Calculator | Compare Two Embeddings Side by Side

Paste two embedding vectors from OpenAI, Cohere, or SBERT and get cosine similarity, angle (degrees), and cosine distance at once. Accepts comma, space, or JSON-array formats up to 4096 dimensions, with norms and an interpretation label.

💡 About this tool

If you have ever shipped a RAG pipeline and stared at a Pinecone score wondering "is 0.82 actually a good match here?", this is the sanity-check you want. Sometimes the fastest way to debug a retrieval bug is to pull two raw vectors out of your logs and compute the similarity by hand instead of trusting the vector store's metric blindly.

Cosine similarity compares the direction of two vectors and ignores their magnitude, which is exactly what you want for text embeddings: a short query and a long passage can still point the same way semantically. This tool parses three real-world formats so you can paste an OpenAI JSON array or a comma-separated row straight from a debug log, and it shows the angle and the cosine distance alongside the raw cosine value so the 0.85 / 0.5 / 0 thresholds are obvious at a glance.

It also catches the usual footguns: mismatched dimensions, a zero vector, or a token that is not a finite number all surface as an error instead of silently producing a misleading score.

🧐 Frequently Asked Questions

Cosine similarity vs. dot product vs. Euclidean — which one matters? Cosine looks at direction only. Dot product mixes direction and magnitude. Euclidean measures straight-line distance. Most embedding models output normalized vectors, so cosine is the default; if your vectors are already unit length, cosine and dot product give the same number.

What counts as "similar"? Is there a magic threshold? For text embeddings, 0.7–0.95 is the usual "similar" band. This tool labels cos > 0.85 as strong, 0.50–0.85 as moderate, and 0–0.50 as weak or unrelated. Thresholds shift per model and task, though, so the honest answer is: plot the distribution on your own data and pick a cutoff.

Why show cosine distance (1 − cos) too? Vector databases rank by distance, where smaller means closer. Pinecone's cosine metric, for example, works in distance space internally, so having the distance value makes it easy to reconcile with what the DB returns.

Is there a dimension limit? Up to 4096 dimensions. OpenAI's text-embedding-3-large (3072) and ada-002 (1536) paste in fine. If A and B have different lengths the tool flags a dimension mismatch.

Why does a zero vector throw an error? A vector with norm 0 has no defined direction, so cosine similarity divides by zero and is undefined. The tool surfaces that instead of returning NaN.

Can I mix formats between A and B? Both vectors are parsed with the same selected format. Switch to JSON for an API response array, or comma mode for a row you copied out of a log.

📚 Why Cosine Wins for Embeddings

Reddit's r/MachineLearning and countless Stack Overflow threads keep landing on the same answer: for sentence and document embeddings, cosine is the workhorse because it factors out length. Two paragraphs of very different word counts can sit at nearly the same angle if they mean the same thing, and angle is what cosine measures.

There is a neat shortcut worth remembering. When both vectors are L2-normalized (norm = 1), cosine similarity equals their dot product, which is why high-performance retrieval engines often store normalized vectors and rank by inner product — it is the cheaper operation but the identical ranking. This tool prints the norms, so if ‖A‖ and ‖B‖ both read close to 1, your embeddings are already normalized and your dot-product and cosine pipelines will agree.