🧱 What are DataParts?

In TuringDB, graphs are versioned as a sequence of commits. Each commit represents a snapshot of the graph’s state at a given point in time. But under the hood, each commit is composed of DataParts, the fundamental unit of storage in TuringDB architecture.

📦 DataParts Explained

Every commit is partitioned into multiple DataParts
Nodes and edges are stored within DataParts
Once written, a DataPart is immutable
Commits reference a collection of DataParts, new and inherited from previous commits

🖼️ Visualization: Imagine a commit as a big box. Inside it, multiple internal boxes labeled DataPart 1, DataPart 2, etc., each storing a portion of the graph.

⚡ Why DataParts?

TuringDB is fundamentally a read-optimized analytical graph database, but DataParts are our answer to achieving high-performance parallel batch writes and data imports, especially for large-scale ingestion workloads.

🔄 Benefits

Write Parallelism Multiple threads or processes can write concurrently to their own private DataPart, without coordination, synchronisation or locking overhead.
Batch Import Performance Ingesting millions of nodes and edges becomes scalable and efficient, even in a system built for sub-millisecond analytics.
Snapshot Safety Each commit references a set of immutable DataParts, allowing us to maintain consistent snapshots and rollback history without duplication.

🧠 How TuringDB Uses DataParts

Each time you add new data or modify existing node/edge properties:

TuringDB creates a new DataPart to store the changes.
It reuses existing DataParts from the parent commit whenever possible.
This leads to efficient incremental storage, only new or changed data consumes additional memory.

Commit 1
 ├── DataPart 1
 ├── DataPart 2
 ├── DataPart 3
 └── DataPart 4

Commit 2
 ├── [references] DataPart 1
 ├── [references] DataPart 2
 ├── [references] DataPart 3
 ├── [references] DataPart 4
 └── [adds]   DataPart 5

🔒 Like git objects, DataParts are immutable and sharable, enabling:

Deduplication of unchanged data
Consistent time-travel queries
Audit-friendly storage history

📏 Tuning for Performance

TuringDB can efficiently read and traverse graphs with up to 200 DataParts per commit. However, for optimal read performance, we aim to consolidate down to a single DataPart per commit.

The fewer the DataParts, the faster the reads, due to improved locality, reduced CPU cache misses, and minimized lookup overhead.

🧭 Roadmap: Intelligent DataPart Merging

We are actively developing policies and algorithms to intelligently merge DataParts in the background. The goal is to:

Automatically compact multiple DataParts into fewer ones
Detect hot paths and frequently accessed subgraphs
Optimize for query throughput and storage locality

In the future, commits will start as multiple data parts for fast ingestion and converge toward compact forms for analytical speed, combining the best of both worlds.

💡 Summary

Feature	Benefit
Immutable DataParts	Safe versioning and reuse
Parallel write ingestion	High-performance batch processing
Shared storage across commits	Lower memory usage, fast snapshots
Merge roadmap	Compact layout for ultimate read speed

TuringDB uses DataParts to balance high-speed writes, versioned safety, and read-optimized performance, all in a single, cohesive engine.

ClickHouse: Parts , A similar model used in high-performance columnar stores to enable immutability, versioning, and efficient compaction.

Get Started

Concepts

Graph Development

AI Workflows

Tutorials

Query Language

Python SDK

Security

Troubleshooting

Roadmap & Feedback

The DataPart System

🧱 What are DataParts?

📦 DataParts Explained

⚡ Why DataParts?

🔄 Benefits

🧠 How TuringDB Uses DataParts

📏 Tuning for Performance

🧭 Roadmap: Intelligent DataPart Merging

💡 Summary

Get Started

Concepts

Graph Development

AI Workflows

Tutorials

Query Language

Python SDK

Security

Troubleshooting

Roadmap & Feedback

​🧱 What are DataParts?

​📦 DataParts Explained

​⚡ Why DataParts?

​🔄 Benefits

​🧠 How TuringDB Uses DataParts

​📏 Tuning for Performance

​🧭 Roadmap: Intelligent DataPart Merging

​💡 Summary

​📝 Related Concepts

🧱 What are DataParts?

📦 DataParts Explained

⚡ Why DataParts?

🔄 Benefits

🧠 How TuringDB Uses DataParts

📏 Tuning for Performance

🧭 Roadmap: Intelligent DataPart Merging

💡 Summary

📝 Related Concepts