Introduction
In 2026, the global data ecosystem has transitioned from isolated storage to context-aware intelligence. For the water industry—an enterprise built on the physical reality of pipes, pumps, valves, and meters—this shift is more than a trend; it is the foundational infrastructure for national digital-twin programs and real-time infrastructure management. As a data team, your goal is to move from raw data to actionable knowledge without succumbing to "system sprawl" or complex operational patterns. Based on our recent evaluation, here is how you can structure a modern Connected Intelligence stack using your existing Azure and Databricks environment.
The Architecture: From Governance to Agency
The most effective "connected intelligence" stack for utilities follows a clear, governed path:
Databricks UC -> Semantic Model -> Ontology -> Graph + SQL Warehouse -> Data Agent
- The Foundation (Databricks UC): Unity Catalog serves as the governed storage layer (Delta Lake/Iceberg), maintaining a single source of truth.
- The Skeleton (Ontology): A fabrics ontology acts as the connective tissue, providing the rules and meaning necessary to interpret raw tabular data as a knowledge graph.
- The Engine (Graph + SQL): This layer unifies relationship-heavy analysis with traditional tabular aggregations, allowing you to query your physical network as a graph without duplicating data.
- The Interface (Data Agent): Agentic AI utilizes GraphRAG (Graph Retrieval-Augmented Generation) to retrieve structured context, improving LLM accuracy by 54.2% over standard retrieval methods.
Choosing the Right Engine: A Comparison of Effectiveness
Selecting a graph engine depends on whether you prioritize depth of analytics, operational simplicity, or integration efficiency.
1. PuppyGraph: The "Zero-ETL" Choice for Data Teams
If the priority is to avoid infrastructure complexity, PuppyGraph stands out as a graph query engine rather than a standalone database.
- Effectiveness: It integrates directly with Databricks UC, allowing you to query existing Delta Lake tables as a graph in minutes.
- Merit: It eliminates the need for complex ETL pipelines and separate storage silos, making it the most operationally lean choice for a data team.
- Performance: Capable of executing 6-hop queries across billions of edges in under 3 seconds.
2. Neo4j: The Ecosystem Standard
Neo4j remains the "safest default" for organisations starting their graph journey.
- Effectiveness: Optimised for 1–5 hop localised traversals, which is ideal for real-time asset lookups and identity resolution.
- Merit: It uses index-free adjacency, meaning performance is independent of total graph size for single-step traversals.
- Trade-off: As a standalone database, it requires its own storage and governance, which may add complexity for teams wanting a "SQL-first" workflow.
3. TigerGraph: The Analytical Workhorse
For city-scale water networks requiring massive traversals, TigerGraph offers a massively parallel processing (MPP) architecture.
- Effectiveness: Engineered for 10+ hop deep traversals in sub-second time.
- Merit: Reported to be 17.9x faster than Neo4j on audited benchmarks for deep analytics.
- Trade-off: Higher learning curve due to its proprietary GSQL language.