Subtitle: Schema design, embedding workflows, hybrid search, and performance tradeoffs explained.

Quick Recap from Part 1
In Part 1, we introduced the MyVector plugin — a native extension that brings vector embeddings and HNSW-based approximate nearest neighbor (ANN) search into MySQL. We covered how MyVector supports scoped queries (e.g., WHERE user_id = X) to ensure that semantic search remains relevant, performant, and secure in real-world multi-tenant applications.
Now in Part 2, we move from concept to implementation:
- How to store and index embeddings
- How to design embedding workflows
- How hybrid (vector + keyword) search works
- How HNSW compares to brute-force search
- How to tune for performance at scale
1. Schema Design for Vector Search
The first step is designing tables that support both structured and semantic data.
A typical schema looks like:
CREATE TABLE documents (
id BIGINT PRIMARY KEY,
user_id INT NOT NULL,
title TEXT,
body TEXT,
embedding VECTOR(384),
INDEX(embedding) VECTOR
);
Design tips:
- Use VECTOR(n) to store dense embeddings (e.g., 384-dim for MiniLM).
- Always combine vector queries with SQL filtering (WHERE user_id = …, category = …) to scope the search space.
- Use TEXT or JSON fields for hybrid or metadata-driven filtering.
- Consider separating raw text from embedding storage for cleaner pipelines.
2. Embedding Pipelines: Where and When to Embed
MyVector doesn’t generate embeddings — it stores and indexes them. You’ll need to decide how embeddings are generated and updated:
a. Offline (batch) embedding
- Run scheduled jobs (e.g., nightly) to embed new rows.
- Suitable for static content (documents, articles).
- Can be run using Python + HuggingFace, OpenAI, etc.
# Python example
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
vectors = model.encode(["Your text goes here"])
b. Write-time embedding
- Embed text when inserted via your application.
- Ensures embeddings are available immediately.
- Good for chat apps, support tickets, and notes.
c. Query-time embedding
- Used for user search input only.
- Transforms search terms into vectors (not stored).
- Passed into queries like:
ORDER BY L2_DISTANCE(embedding, '[query_vector]') ASC
3. Hybrid Search: Combine Text and Semantics
Most real-world search stacks benefit from combining keyword and vector search. MyVector enables this inside a single query:
SELECT id, title
FROM documents
WHERE MATCH(title, body) AGAINST('project deadline')
AND user_id = 42
ORDER BY L2_DISTANCE(embedding, EMBED('deadline next week')) ASC
LIMIT 5;
This lets you:
- Narrow results using lexical filters
- Re-rank them semantically
- All in MySQL — no sync to external vector DBs
This hybrid model is ideal for support systems, chatbots, documentation search, and QA systems.
4. Brute-Force vs. HNSW Indexing in MyVector
When it comes to similarity search, how you search impacts how fast you scale.
Brute-force search
- Compares the query against every row
- Guarantees exact results (100% recall)
- Simple but slow for >10K rows
SELECT id
FROM documents
ORDER BY COSINE_DISTANCE(embedding, '[query_vector]') ASC
LIMIT 5;
HNSW: Hierarchical Navigable Small World
- Graph-based ANN algorithm used by MyVector
- Fast and memory-efficient
- High recall (~90–99%) with tunable parameters (ef_search, M)
CREATE INDEX idx_vec ON documents(embedding) VECTOR
COMMENT='{"HNSW_M": 32, "HNSW_EF_CONSTRUCTION": 200}';
Comparison
| Feature | Brute Force | HNSW (MyVector) |
|---|---|---|
| Recall | ✅ 100% | 🔁 ~90–99% |
| Latency (1M rows) | ❌ 100–800ms+ | ✅ ~5–20ms |
| Indexing | ❌ None | ✅ Required |
| Filtering Support | ✅ Yes | ✅ Yes |
| Ideal Use Case | Small datasets | Production search |
5. Scoped Search as a Security Boundary
Because MyVector supports native SQL filtering, you can enforce access boundaries without separate vector security layers.
Patterns:
- WHERE user_id = ? → personal search
- WHERE org_id = ? → tenant isolation
- Use views or stored procedures to enforce access policies
You don’t need to bolt access control onto your search engine — MySQL already knows your users.
6. HNSW Tuning for Performance
MyVector lets you tune index behavior at build or runtime:
| Param | Purpose | Effect |
|---|---|---|
| M | Graph connectivity | Higher = more accuracy + RAM |
| ef_search | Traversal breadth during queries | Higher = better recall, more latency |
| ef_construction | Index quality at build time | Affects accuracy and build cost |
Example:
ALTER INDEX idx_vec SET HNSW_M = 32, HNSW_EF_SEARCH = 100;
You can also control ef_search per session or per query soon (planned feature).
TL;DR: Production Patterns with MyVector
- Use VECTOR(n) columns and HNSW indexing for fast ANN search
- Embed externally using HuggingFace, OpenAI, Cohere, etc.
- Combine text filtering + vector ranking for hybrid search
- Use SQL filtering to scope vector search for performance and privacy
- Tune ef_search and M to control latency vs. accuracy
Coming Up in Part 3
In Part 3, we’ll explore real-world implementations:
- Semantic search
- Real-time document recall
- Chat message memory + re-ranking
- Integrating MyVector into RAG and AI workflows
We’ll also show query plans and explain fallbacks when HNSW is disabled or brute-force is needed.