Here's a search query: "beach trip."
Full-text search finds nothing — no record contains the word "beach." But there's a note that says "Qué calor en Valencia, el agua estaba perfecta." Semantic search finds it because the embedding for "beach trip" is close to the embedding for a hot day at the beach in Valencia.
Now a different query: "Ana García."
Semantic search returns a dozen vaguely related records. Full-text search returns the 3 records that literally contain "Ana García." But neither shows you that Ana attended last week's meeting, is CC'd on 5 email threads, and appears in tomorrow's calendar — connections that only the knowledge graph knows about.
No single search method is enough. We needed all three, plus a way to combine them that doesn't require manual tuning.
The four signals
Our search pipeline produces four independent scores for every candidate result:
Signal
Source
What it catches
Tier
Semantic
pgvector cosine similarity
Meaning-based matches ("beach" → "calor en Valencia")
Pro
Full-text
tsvector + GIN + ts_rank
Exact keyword matches, fast and precise
Free
Graph
entity_links overlap
Relational connections ("Ana García" → meetings she attended)
Pro
Heat
record_heat table
Temporal relevance (recently accessed records)
Free (display), Pro (in ranking)
Free tier users get full-text search only — which is still fast and well-ranked thanks to tsvector with weighted columns (title gets weight A, content gets weight B, tags get weight C). Pro users get all four signals fused together.
The pipeline
The search happens in seven steps:
Query: "Ana García project update"
│
├── 1. Vector search ──→ top-50 by cosine similarity
├── 2. Full-text search ──→ top-N by ts_rank (UNION ALL across domains)
└── 3. Graph discovery ──→ N candidates via entity_links
│
▼
4. Deduplicate by (domain, record_id)
│
▼
5. Rank-normalize each signal to [0, 1]
│
▼
6. Detect degenerate signals
│
▼
7. Weighted fusion + multi-signal bonus
│
▼
Final ranked results with provenance
Let me walk through each step.
Step 1: Vector search
The query text is embedded on-the-fly using the same model that embeds records (qwen3-embedding:0.6b, 1024 dimensions). Then a cosine similarity query runs against the embeddings table:
SELECT domain, record_id, 1 - (embedding <=> $query_embedding) AS similarity
FROM embeddings
WHERE 1 - (embedding <=> $query_embedding) >= 0.3
ORDER BY embedding <=> $query_embedding
LIMIT 50;
The 0.3 minimum threshold filters garbage. The top 50 candidates move to the next step. If Ollama is down and we can't embed the query, this signal is simply skipped — the other signals still work.
Step 2: Full-text search
A UNION ALL query across all domain tables, using PostgreSQL's native full-text search:
SELECT 'note' AS domain, id AS record_id, ts_rank(search_vector, query) AS score
FROM notes
WHERE search_vector @@ plainto_tsquery('simple', $q) AND deleted_at IS NULL
UNION ALL
SELECT 'event', id, ts_rank(search_vector, query)
FROM events
WHERE search_vector @@ plainto_tsquery('simple', $q) AND deleted_at IS NULL
UNION ALL
-- ... contacts, emails, files, diary, bookmarks, kanban_cards
ORDER BY score DESC
LIMIT 50;
We use plainto_tsquery('simple', ...) instead of language-specific configurations. The simple configuration doesn't stem words, which matters for multilingual data — Spanish and English records coexist, and stemming rules for one language would butcher the other.
Each domain table has a search_vector tsvector column maintained by a trigger (or GENERATED ALWAYS AS ... STORED for newer tables). The vectors are weighted: title gets 'A', description/content gets 'B', tags get 'C'. A match in the title ranks higher than a match in the body.
Step 3: Graph discovery
This signal is different — it doesn't match text, it matches relationships.
The query is matched against graph_entities.normalized_name. If "Ana García" matches a Person entity, we find all records linked to that entity via entity_links:
-- Find entities mentioned in the query
SELECT id FROM graph_entities
WHERE normalized_name ILIKE '%ana garcia%' AND deleted_at IS NULL;
-- Find all records linked to those entities
SELECT source_type AS domain, source_id AS record_id
FROM entity_links
WHERE target_type = 'graph_entity' AND target_id = ANY($entity_ids);
The graph_score for each result is the overlap ratio: how many of the query's entities appear in the result's connections, divided by the total entities found in the query.
Step 4: Deduplication
The three signals produce candidate sets that overlap. A note containing "Ana García" might appear in vector search (semantically similar), full-t
Tags:
#0
Want to run a more efficient business?
Mewayz gives you CRM, HR, Accounting, Projects & eCommerce — all in one workspace. 14-day free trial, no credit card needed.
Try Mewayz Free →