User Guide¶
Working with EDBLayer¶
The EDBLayer is the foundation of your knowledge base. It manages the extensional database (base facts).
Creating an EDBLayer¶
You can create an EDBLayer with or without a configuration file:
import triggergraphs as tg
# Empty EDBLayer
edb = tg.EDBLayer()
# With configuration file
edb = tg.EDBLayer("edb.conf")
Adding Data Sources¶
CSV Sources (In-Memory)
The simplest way to add facts is using CSV data:
edb.add_csv_source("person", [
["alice"],
["bob"],
["charlie"]
])
edb.add_csv_source("friend", [
["alice", "bob"],
["bob", "charlie"]
])
Replacing Facts
You can replace the facts in an existing predicate:
edb.replace_facts_csv_source("person", [
["alice"],
["dave"]
])
Custom Python Sources
For more complex scenarios, you can implement custom data sources:
# obj should implement the table interface
edb.add_source("predicate_name", obj)
Querying the EDBLayer¶
# Get all predicates
predicates = edb.get_predicates()
# Get number of predicates
n_preds = edb.get_n_predicates()
# Get facts for a specific predicate
facts = edb.get_facts("person")
# Returns: [[term_id_1], [term_id_2], ...]
# Get term ID for a string
term_id = edb.get_term_id("alice")
# Get number of unique terms
n_terms = edb.get_n_terms()
Working with Programs¶
A Program contains the Datalog rules that define your reasoning logic.
Creating and Populating Programs¶
# Create a program
program = tg.Program(edb)
# Add rules one at a time
rule_id = program.add_rule("ancestor(X,Y) :- parent(X,Y)")
# Load rules from a file
program.load_from_file("rules.dl")
Inspecting Programs¶
# Get number of rules
n_rules = program.get_n_rules()
# Get a specific rule
rule_str = program.get_rule(0) # First rule
# Get predicate name by ID
pred_name = program.get_predicate_name(pred_id)
Magic Set Transformation¶
For more efficient query answering, you can apply magic set transformation:
# Transform program for a specific query
new_program, input_pred_id, output_pred_id = program.apply_magic_transform("ancestor(alice,X)")
# Use the transformed program for reasoning
reasoner = tg.Reasoner("tgchase", edb, new_program)
Working with Reasoners¶
The Reasoner executes the reasoning process.
Chase Algorithms¶
TriggerGraphs supports several chase algorithm variants:
# Basic trigger graph chase (no provenance)
reasoner = tg.Reasoner("tgchase", edb, program, typeProv="NOPROV")
# With node-level provenance
reasoner = tg.Reasoner("tgchase", edb, program, typeProv="NODEPROV")
# With full provenance tracking
reasoner = tg.Reasoner("tgchase", edb, program, typeProv="FULLPROV")
# Static trigger graph chase
reasoner = tg.Reasoner("tgchase_static", edb, program)
# Probabilistic trigger graph chase
reasoner = tg.Reasoner("probtgchase", edb, program)
Configuration Options¶
The Reasoner constructor accepts several configuration parameters:
reasoner = tg.Reasoner(
typeChase="tgchase", # Chase algorithm type
edb=edb, # EDB layer
program=program, # Program
queryCont=True, # Query containment optimization
edbCheck=True, # Check EDB during reasoning
rewriteCliques=True, # Rewrite cliques
tgpath="", # Path to save TG to disk
typeProv="NODEPROV", # Provenance type
delProofs=True # Delete proofs (for prob chase)
)
Running the Reasoner¶
# Run reasoning from start
stats = reasoner.create_model()
# Run from a specific step
stats = reasoner.create_model(startStep=5)
# Run with a maximum number of steps
stats = reasoner.create_model(maxStep=100)
# Inspect statistics
print(f"Nodes: {stats['n_nodes']}")
print(f"Edges: {stats['n_edges']}")
print(f"Triggers: {stats['n_triggers']}")
print(f"Derivations: {stats['n_derivations']}")
print(f"Steps: {stats['steps']}")
print(f"Runtime (ms): {stats['runtime_ms']}")
print(f"Max memory (MB): {stats['max_mem_mb']}")
Working with Trigger Graphs¶
After reasoning, you can access the TG (Trigger Graph) to inspect the reasoning process.
Accessing the TG¶
tg_graph = reasoner.get_TG()
Inspecting the TG¶
# Get graph statistics
n_nodes = tg_graph.get_n_nodes()
n_edges = tg_graph.get_n_edges()
n_facts = tg_graph.get_n_facts()
# Get size of a specific node
node_size = tg_graph.get_node_size(node_id)
Exporting the TG¶
# Export to files
tg_graph.dump_files("output_directory")
Querying Results¶
The Querier provides methods to extract facts and derivation information.
Creating a Querier¶
querier = tg.Querier(tg_graph)
Getting All Facts¶
# Get all facts as a dictionary
all_facts = querier.get_all_facts()
# Returns: {"predicate_name": [[term1, term2, ...], ...], ...}
for pred_name, facts in all_facts.items():
print(f"{pred_name}:")
for fact in facts:
print(f" {fact}")
Querying Specific Predicates¶
# Get list of all predicates in the TG
predicates = querier.get_list_predicates()
# Get facts with coordinates for a predicate
facts_coords = querier.get_facts_coordinates_with_predicate("ancestor")
# Returns list of ((fact_tuple, (node_id, offset))
for (fact, (node_id, offset)) in facts_coords:
print(f"Fact {fact} at node {node_id}, offset {offset}")
Provenance Queries¶
If you used provenance tracking, you can query derivation trees:
# Get derivation tree for a fact
tree_json = querier.get_derivation_tree(node_id, fact_id)
# Get leaves (base facts) used in a derivation
leaves = querier.get_leaves(node_id, fact_id)
# Get node details for a predicate
node_details = querier.get_node_details_predicate("ancestor")
Working with TupleSets¶
Some query operations return TupleSet objects:
# Get number of facts in a tuple set
n_facts = tuple_set.get_n_facts()
# Get a specific fact from a tuple set
fact = querier.get_fact_in_TupleSet(tuple_set, fact_id)
# Get derivation tree for a fact in a tuple set
tree = querier.get_derivation_tree_in_TupleSet(tuple_set, fact_id)
ID Translation¶
Facts are stored internally using numeric IDs. The Querier can translate them:
# Get predicate name from ID
pred_name = querier.get_predicate_name(pred_id)
# Get term name from ID
term_name = querier.get_term_name(term_id)
Logging Configuration¶
Control logging verbosity:
import triggergraphs as tg
# Set logging level
# 0 = TRACE, 1 = DEBUG, 2 = INFO, 3 = WARNING, 4 = ERROR, 5 = FATAL
tg.set_logging_level(2) # INFO level (default: WARNING)
Best Practices¶
Memory Management
For large datasets, consider using file-based storage instead of in-memory
Use appropriate chase algorithms (
tgchase_staticfor smaller graphs)Monitor
max_mem_mbin reasoning statistics
Performance
Apply magic set transformation for specific queries
Use
queryCont=Truefor query containment optimizationDisable unnecessary features (e.g., provenance) if not needed
Debugging
Start with
set_logging_level(1)to see detailed logsInspect
statsreturned bycreate_model()to understand performanceUse
get_rule()to verify rules were parsed correctly
Rule Design
Avoid Cartesian products in rules (ensure join conditions)
Consider rule ordering for readability
Test with small datasets first