Multiple documents (clinical notes) to graph

This guide demonstrates how to create a comprehensive workflow that processes files from S3 storage, extracts entities using AI, and generates a knowledge graph. The workflow combines file processing, machine learning, and data visualization. Here we take multiple clinical notes documents (for the example we are using synthetic data), to uncover relationships between treatments, markers, health events, symptoms, clinical outcomes of a patient across multiple encounters. You can learn more on how our workflow works: Workflow Concepts

Workflow Overview

This workflow performs the following operations:

Lists files from S3 storage
Iterates through each file using a ForEach loop
Loads the file content
Extracts entities using Mistral AI
Merges all extracted entities
Writes data to TuringDB
Generates GML (Graph Modeling Language) output
Creates an AI-powered summary of the graph

Step-by-Step Implementation

Step 1. Upload the clinical reports to S3

clinical-reports.zip First of all, download the example clinical reports, and upload them to your S3 data storage

Step 2. Add an S3ListFiles node

Create a new workflow and add an S3ListFiles node as a first step. Node Configuration:

Parameters:
- Filenames: List of specific filenames to process
- Output Field: files
- Max item Count: 5 (limits the number of files processed)

This node will output a collection of file references that can be processed by subsequent nodes. S3ListFiles

Step 3: Set Up ForEach Loop

Add a ForEach node to iterate through the file list and connect its first input to the previously created S3ListFiles node Node Configuration:

Parameters:
- Current Field: $i (iterator variable name)
- Collection field: $files (references the output from S3ListFiles)

The ForEach node creates a loop structure that will process each file individually.

Step 4: Load File Content

Add a new S3LoadFile node to the canvas, and connect its input to the first output of the previously created ForEach node. Node Configuration:

Parameters:
- Output Field: $file_contents[$files[$i].key]
- File Key (the name of the file): $files[$i].key
- File Type: text

This node loads the actual content of each file for processing, and stores the content in a new $file_contents dict, using the name of the file ($files[$i].key) as the key

Step 5: Extract Entities with AI

Add a MistralEntityExtractor node to analyze the current file’s content. Connect its input to the output of the previous S3LoadFile node. In order to close the ForEach loop, connect its output to the second input of the ForEach node. Node Configuration:

Parameters:
- Input Field Name: $file_contents[$files[$i].key].data
- Output Field Name: $entities[$files[$i].key]

This node uses Mistral AI to identify and extract entities from each file’s content retrieved from the $file_contents dict at the $files[$i].key key. The resulting entities are stored in a new $entities

Step 6: Merge All Entities

After the loop, the JSON data will contain a dict field $entities with key/value pairs corresponding to filename/extracted_entities. Thus, we now need to merge the entities into a single object. Add a MergeEntities node to the canvas and connect its input to the second output of the ForEach node. The node will be executed as soon as we exit out of the loop. Node Configuration:

Parameters:
- Input Field: $entities
- Output Field: $merged_entities

Step 7: Write to Database

Add a TuringDBWrite node to create new graph containing the extracted entities. Connect the node’s input to the previous MergeEntities node’s output Node Configuration:

Parameters:
- Input Field: $merged_entities
- Graph Name: multi_doc

Step 8: Generate Graph Format

In parallel, we can generate a gml file and ask an LLM to explain the structure of the graph. Add a GMLGenerator node to the canvas and connect its input to the MergeEntities node’s output. Node Configuration:

Parameters:
- Input Field Name: $merged_entities
- Output Field Name: $gml

Step 9: Create AI Summary

Finally, add a MistralGraphExplainer node and connect its input to the GMLGenerator node’s output Node Configuration:

Parameters:
- Input Field Name: $merged_entities
- Output Field Name: $gml

Key Features Demonstrated

Data Field Operations

The workflow showcases advanced data field manipulation:

Append operations: Building collections over iterations
Dynamic field access: Using $files[$i].key for array indexing
Field chaining: Connecting data between multiple nodes

Loop Processing

The ForEach construct enables:

Iterative file processing
Dynamic data collection

Parallel Processing

The workflow splits into two branches after entity merging:

Database storage path
Visualization and summary path

Expected Output

Upon completion, this workflow produces:

A populated TuringDB graph with extracted entities -> checkout the graph generated in Graph Visualiser of your TuringDB instance
GML format representation suitable for graph visualization tools
An AI-generated summary explaining the relationships and insights discovered in your data

Explore the clinical graph:

Get Started

Concepts

Graph Development

AI Workflows

Tutorials

Query Language

Python SDK

Security

Troubleshooting

Roadmap & Feedback

Multiple documents (clinical notes) to graph

Workflow Overview

Step-by-Step Implementation