DecisionGraph is a deterministic analytics system built on an explicit ontology: metrics, dimensions, joins, and grain are first-class. Natural language (or API intents) compile to candidate query plans that must pass schema validation, cardinality checks, and policy gates before execution.
The product thesis is simple: dashboards fail when semantics are implicit. DecisionGraph makes semantics executable—so "revenue by region" always means the same grain, the same filters, and the same time spine.
Highlights:
- Semantic layer as code (versioned, reviewable) instead of tribal spreadsheet logic.
- No hallucinated SQL: LLMs may propose intent; the executor only runs whitelisted templates and parameter bindings.
- Latency-aware planning chooses pre-aggregates when available and falls back transparently.
From resume to something you can read
Matches the resume claim: schema introspection and NL-to-SQL constrained by the semantic catalog—question in, compiled SQL out, result grid.
Natural language input
"What was revenue last month by region?"
Resolved to intent (metric, grain, time spine)—not a free-form SQL string.
Compiled SQL (catalog templates only)
1SELECT d.region,2 SUM(f.revenue_net) AS revenue_net3FROM finance.orders_fact f4JOIN org.dim_region d ON f.region_id = d.region_id5WHERE f.order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month')6 AND f.order_date < DATE_TRUNC('month', CURRENT_DATE)7GROUP BY d.region8ORDER BY revenue_net DESC;Identifiers come from the catalog; executor rejects raw tables/columns outside approved templates.
Example query result
| region | revenue_net |
|---|---|
| North | ₹ 2.4M |
| South | ₹ 1.9M |
| West | ₹ 1.1M |
Deterministic KPIs: same question → same grain and filters as the governed metric definition.
Teams want "ChatGPT for data," but production needs stable definitions, correct joins, and governed access. Text-to-SQL demos look magical until the first wrong join or ambiguous grain.
The challenge is to preserve the speed of natural language while keeping deterministic execution guarantees aligned with enterprise semantics.
-
Canonical model: Encode facts, dimensions, and safe join paths in a graph; forbid ambiguous many-to-many traversals unless explicitly declared.
-
Intent → plan: Parse questions into structured intent objects (metric, slice, time range)—not raw SQL strings.
-
Validation: Run cardinality estimates, row-level security predicates, and "explain" dry-runs before execution.
-
Caching & reuse: Store signed query plans per intent hash so repeat questions hit compiled SQL, not the planner.
- 01
Under-specified business terms map to the wrong metric ID—mitigated with explicit confirmation for ambiguous matches.
- 02
Warehouse optimizer quirks can skew P95; surfaced via plan fingerprints and regression tests per template.
- 03
Role-based entitlements drift from warehouse reality—sync job must be observable.
- 01
More upfront modeling work than a vanilla text-to-SQL toy; pays off in correctness and trust.
- 02
Curated templates limit exotic ad-hoc queries—power users export to governed notebooks instead.
- 03
Stricter planner means slower feature velocity on day one, faster on day 100.
compile.py
Intent must resolve to a whitelisted template
1 Designed
Owned the semantic catalog schema, join safety rules, and intent/plan separation.
Implemented
Implemented the compiler pipeline, validation layer, and warehouse execution adapters.
Scrapped
End-to-end neural SQL—replaced with constrained generation over approved templates for reliability.