Skip to main content

πŸ” String Approximation Operator (~=)

TuringDB extends the standard Cypher query language with an intuitive and efficient approximate string matching operator: ~=. This feature is ideal for exploring knowledge graphs, especially when dealing with noisy data, ambiguous labels, or unknown naming conventions.

🧠 What It Does

The ~= operator allows you to query string properties on nodes or edges without requiring exact matches or complex regular expressions. Instead of:
MATCH (n{name="APOE-4"}) RETURN n
You can write:
MATCH (n{name~="apoe"}) RETURN n
This will return all nodes where the property name is approximately related to β€œapoe”.

πŸ“¦ Why It’s Useful

  • βœ… No regex required β€” More human-friendly and readable
  • ⚑ Faster than regex β€” Avoids index-bypass performance issues seen in Neo4J Source
  • πŸ” Designed for discovery β€” Perfect for exploratory search, fuzzy graph lookups, or biomedical graph use cases

πŸ“„ Example 1: Matching Biological Entities

Given nodes:
(node:...{id:0, ..., name:"APOE-4 [extracellular]", ...})
(node:...{id:1, ..., name:"APOE-4", ...})
(node:...{id:2, ..., name:"APOE-4 [intracellular]", ...})
Query:
MATCH (n{name~="apoe"}) RETURN n
Results: Match
APOE-4 [extracellular]
APOE-4
APOE-4 [intracellular]
βœ… All relevant nodes returned using only a single word β€” apoe.

πŸ“„ Example 2: Prefix Word Matching

Given nodes:
(node{id:1, desc:"play"})
(node{id:2, desc:"playful"})
(node{id:3, desc:"playfully"})
(node{id:4, desc:"pl"})
(node{id:5, desc:"plays"})
Query:
MATCH (n{desc~="play"}) RETURN n.id
Results:
id
---
1
2
3
5
  • βœ… play, playful, playfully, and plays all match
  • ❌ pl does not match (pl only matches 50% of β€œplay” β€” below threshold)

βš™οΈ How It Works

  • Matching is done using word-level prefix matching
  • A β€œword” is any substring separated by whitespace
  • Only alphanumeric characters are used β€” symbols are stripped before matching
  • The minimum match threshold is 75% of the query string’s length

Match Example

  • Query: "play" (length: 4)
  • Minimum prefix: "pla" (75% of 4 = 3)
  • playful βœ…
  • plays βœ…
  • pl ❌

πŸ”¬ Use Cases

  • Searching biomedical knowledge graphs (e.g., proteins, genes, diseases)
  • Fuzzy matching in messy datasets
  • Finding similar named entities (e.g., APOE-4, APOE, APOE2)
  • Natural language matching for agentic workflows or LLM graph queries

πŸ“ Syntax Summary

MATCH (n{property~="search_term"}) RETURN n
  • Works with any node or edge property that is a string
  • Case-insensitive
  • No regex or wildcards needed

🚧 Limitations

  • Works only on string properties
  • Currently supports prefix word-level matching only
  • Does not support substring or typo-tolerant matching yet (planned roadmap)

πŸ”­ Future Improvements - Roadmap

TuringDB may extend ~= in the future with:
  • Fuzzy edit-distance matching (e.g., levenshtein)
  • Optional configuration for matching thresholds
  • Substring or suffix modes
Stay tuned!
⌘I