User Guide ========== Working with EDBLayer --------------------- The ``EDBLayer`` is the foundation of your knowledge base. It manages the extensional database (base facts). Creating an EDBLayer ~~~~~~~~~~~~~~~~~~~~ You can create an EDBLayer with or without a configuration file: .. code-block:: python import triggergraphs as tg # Empty EDBLayer edb = tg.EDBLayer() # With configuration file edb = tg.EDBLayer("edb.conf") Adding Data Sources ~~~~~~~~~~~~~~~~~~~ **CSV Sources (In-Memory)** The simplest way to add facts is using CSV data: .. code-block:: python edb.add_csv_source("person", [ ["alice"], ["bob"], ["charlie"] ]) edb.add_csv_source("friend", [ ["alice", "bob"], ["bob", "charlie"] ]) **Replacing Facts** You can replace the facts in an existing predicate: .. code-block:: python edb.replace_facts_csv_source("person", [ ["alice"], ["dave"] ]) **Custom Python Sources** For more complex scenarios, you can implement custom data sources: .. code-block:: python # obj should implement the table interface edb.add_source("predicate_name", obj) Querying the EDBLayer ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get all predicates predicates = edb.get_predicates() # Get number of predicates n_preds = edb.get_n_predicates() # Get facts for a specific predicate facts = edb.get_facts("person") # Returns: [[term_id_1], [term_id_2], ...] # Get term ID for a string term_id = edb.get_term_id("alice") # Get number of unique terms n_terms = edb.get_n_terms() Working with Programs --------------------- A ``Program`` contains the Datalog rules that define your reasoning logic. Creating and Populating Programs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Create a program program = tg.Program(edb) # Add rules one at a time rule_id = program.add_rule("ancestor(X,Y) :- parent(X,Y)") # Load rules from a file program.load_from_file("rules.dl") Inspecting Programs ~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get number of rules n_rules = program.get_n_rules() # Get a specific rule rule_str = program.get_rule(0) # First rule # Get predicate name by ID pred_name = program.get_predicate_name(pred_id) Magic Set Transformation ~~~~~~~~~~~~~~~~~~~~~~~~ For more efficient query answering, you can apply magic set transformation: .. code-block:: python # Transform program for a specific query new_program, input_pred_id, output_pred_id = program.apply_magic_transform("ancestor(alice,X)") # Use the transformed program for reasoning reasoner = tg.Reasoner("tgchase", edb, new_program) Working with Reasoners ---------------------- The ``Reasoner`` executes the reasoning process. Chase Algorithms ~~~~~~~~~~~~~~~~ TriggerGraphs supports several chase algorithm variants: .. code-block:: python # Basic trigger graph chase (no provenance) reasoner = tg.Reasoner("tgchase", edb, program, typeProv="NOPROV") # With node-level provenance reasoner = tg.Reasoner("tgchase", edb, program, typeProv="NODEPROV") # With full provenance tracking reasoner = tg.Reasoner("tgchase", edb, program, typeProv="FULLPROV") # Static trigger graph chase reasoner = tg.Reasoner("tgchase_static", edb, program) # Probabilistic trigger graph chase reasoner = tg.Reasoner("probtgchase", edb, program) Configuration Options ~~~~~~~~~~~~~~~~~~~~~ The ``Reasoner`` constructor accepts several configuration parameters: .. code-block:: python reasoner = tg.Reasoner( typeChase="tgchase", # Chase algorithm type edb=edb, # EDB layer program=program, # Program queryCont=True, # Query containment optimization edbCheck=True, # Check EDB during reasoning rewriteCliques=True, # Rewrite cliques tgpath="", # Path to save TG to disk typeProv="NODEPROV", # Provenance type delProofs=True # Delete proofs (for prob chase) ) Running the Reasoner ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Run reasoning from start stats = reasoner.create_model() # Run from a specific step stats = reasoner.create_model(startStep=5) # Run with a maximum number of steps stats = reasoner.create_model(maxStep=100) # Inspect statistics print(f"Nodes: {stats['n_nodes']}") print(f"Edges: {stats['n_edges']}") print(f"Triggers: {stats['n_triggers']}") print(f"Derivations: {stats['n_derivations']}") print(f"Steps: {stats['steps']}") print(f"Runtime (ms): {stats['runtime_ms']}") print(f"Max memory (MB): {stats['max_mem_mb']}") Working with Trigger Graphs ---------------------------- After reasoning, you can access the ``TG`` (Trigger Graph) to inspect the reasoning process. Accessing the TG ~~~~~~~~~~~~~~~~ .. code-block:: python tg_graph = reasoner.get_TG() Inspecting the TG ~~~~~~~~~~~~~~~~~ .. code-block:: python # Get graph statistics n_nodes = tg_graph.get_n_nodes() n_edges = tg_graph.get_n_edges() n_facts = tg_graph.get_n_facts() # Get size of a specific node node_size = tg_graph.get_node_size(node_id) Exporting the TG ~~~~~~~~~~~~~~~~ .. code-block:: python # Export to files tg_graph.dump_files("output_directory") Querying Results ---------------- The ``Querier`` provides methods to extract facts and derivation information. Creating a Querier ~~~~~~~~~~~~~~~~~~ .. code-block:: python querier = tg.Querier(tg_graph) Getting All Facts ~~~~~~~~~~~~~~~~~ .. code-block:: python # Get all facts as a dictionary all_facts = querier.get_all_facts() # Returns: {"predicate_name": [[term1, term2, ...], ...], ...} for pred_name, facts in all_facts.items(): print(f"{pred_name}:") for fact in facts: print(f" {fact}") Querying Specific Predicates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Get list of all predicates in the TG predicates = querier.get_list_predicates() # Get facts with coordinates for a predicate facts_coords = querier.get_facts_coordinates_with_predicate("ancestor") # Returns list of ((fact_tuple, (node_id, offset)) for (fact, (node_id, offset)) in facts_coords: print(f"Fact {fact} at node {node_id}, offset {offset}") Provenance Queries ~~~~~~~~~~~~~~~~~~ If you used provenance tracking, you can query derivation trees: .. code-block:: python # Get derivation tree for a fact tree_json = querier.get_derivation_tree(node_id, fact_id) # Get leaves (base facts) used in a derivation leaves = querier.get_leaves(node_id, fact_id) # Get node details for a predicate node_details = querier.get_node_details_predicate("ancestor") Working with TupleSets ~~~~~~~~~~~~~~~~~~~~~~ Some query operations return ``TupleSet`` objects: .. code-block:: python # Get number of facts in a tuple set n_facts = tuple_set.get_n_facts() # Get a specific fact from a tuple set fact = querier.get_fact_in_TupleSet(tuple_set, fact_id) # Get derivation tree for a fact in a tuple set tree = querier.get_derivation_tree_in_TupleSet(tuple_set, fact_id) ID Translation ~~~~~~~~~~~~~~ Facts are stored internally using numeric IDs. The Querier can translate them: .. code-block:: python # Get predicate name from ID pred_name = querier.get_predicate_name(pred_id) # Get term name from ID term_name = querier.get_term_name(term_id) Logging Configuration --------------------- Control logging verbosity: .. code-block:: python import triggergraphs as tg # Set logging level # 0 = TRACE, 1 = DEBUG, 2 = INFO, 3 = WARNING, 4 = ERROR, 5 = FATAL tg.set_logging_level(2) # INFO level (default: WARNING) Best Practices -------------- **Memory Management** - For large datasets, consider using file-based storage instead of in-memory - Use appropriate chase algorithms (``tgchase_static`` for smaller graphs) - Monitor ``max_mem_mb`` in reasoning statistics **Performance** - Apply magic set transformation for specific queries - Use ``queryCont=True`` for query containment optimization - Disable unnecessary features (e.g., provenance) if not needed **Debugging** - Start with ``set_logging_level(1)`` to see detailed logs - Inspect ``stats`` returned by ``create_model()`` to understand performance - Use ``get_rule()`` to verify rules were parsed correctly **Rule Design** - Avoid Cartesian products in rules (ensure join conditions) - Consider rule ordering for readability - Test with small datasets first