Introduction

In 2026, the global data ecosystem has transitioned from isolated storage to context-aware intelligence. For the water industry—an enterprise built on the physical reality of pipes, pumps, valves, and meters—this shift is more than a trend; it is the foundational infrastructure for national digital-twin programs and real-time infrastructure management.

As a data team, your goal is to move from raw data to actionable knowledge without succumbing to "system sprawl" or complex operational patterns. Based on our recent evaluation, here is how you can structure a modern Connected Intelligence stack using your existing Azure and Databricks environment.

The Architecture: From Governance to Agency

The most effective "connected intelligence" stack for utilities follows a clear, governed path:

Databricks UC -> Semantic Model -> Ontology -> Graph + SQL Warehouse -> Data Agent

The Foundation (Databricks UC): Unity Catalog serves as the governed storage layer (Delta Lake/Iceberg), maintaining a single source of truth.
The Skeleton (Ontology): A fabrics ontology acts as the connective tissue, providing the rules and meaning necessary to interpret raw tabular data as a knowledge graph.
The Engine (Graph + SQL): This layer unifies relationship-heavy analysis with traditional tabular aggregations, allowing you to query your physical network as a graph without duplicating data.
The Interface (Data Agent): Agentic AI utilizes GraphRAG (Graph Retrieval-Augmented Generation) to retrieve structured context, improving LLM accuracy by 54.2% over standard retrieval methods.

Choosing the Right Engine: A Comparison of Effectiveness

Selecting a graph engine depends on whether you prioritize depth of analytics, operational simplicity, or integration efficiency.

1. PuppyGraph: The "Zero-ETL" Choice for Data Teams

If the priority is to avoid infrastructure complexity, PuppyGraph stands out as a graph query engine rather than a standalone database.

Effectiveness: It integrates directly with Databricks UC, allowing you to query existing Delta Lake tables as a graph in minutes.
Merit: It eliminates the need for complex ETL pipelines and separate storage silos, making it the most operationally lean choice for a data team.
Performance: Capable of executing 6-hop queries across billions of edges in under 3 seconds.

2. Neo4j: The Ecosystem Standard

Neo4j remains the "safest default" for organisations starting their graph journey.

Effectiveness: Optimised for 1–5 hop localised traversals, which is ideal for real-time asset lookups and identity resolution.
Merit: It uses index-free adjacency, meaning performance is independent of total graph size for single-step traversals.
Trade-off: As a standalone database, it requires its own storage and governance, which may add complexity for teams wanting a "SQL-first" workflow.

3. TigerGraph: The Analytical Workhorse

For city-scale water networks requiring massive traversals, TigerGraph offers a massively parallel processing (MPP) architecture.

Effectiveness: Engineered for 10+ hop deep traversals in sub-second time.
Merit: Reported to be 17.9x faster than Neo4j on audited benchmarks for deep analytics.
Trade-off: Higher learning curve due to its proprietary GSQL language.

Pure Technical Merit: Architectural Paradigms

Feature	Native Graph (e.g., Neo4j/Tiger)	Zero-ETL (e.g., PuppyGraph)	Multi-Model (e.g., ArangoDB)
Linkage	Index-free Adjacency	Schema-on-read	Index-based Joins
Storage	Specialised Graph Store	Existing Warehouse/Lake	Document/KV/Relational
Latency	sub-ms per hop	Sub-second (6-hops)	Degrades with depth
Complexity	High (New DB Silo)	Low (Analytics Layer)	Moderate (Consolidated)

Strategic Utility in the Water Industry

Implementing this architecture allows your data team to deliver high-value outcomes:

Blast Radius Analysis: Quickly find every customer or secondary system affected by a pipe burst through multi-hop recursive filtering.
Predictive Maintenance: Ground your Data Agent in the "digital thread" of an asset's lifecycle, from installation logs to real-time sensor data.
Regulatory Compliance: Maintain an auditable view of data lineage, tracking how information flows through your utility's systems to meet environmental safety standards.

Final Recommendation

For a data team already using Azure and Databricks, seeking to avoid new patterns, PuppyGraph offers the most effective path by treating your existing Unity Catalog as the graph's storage layer. However, if you require a mature, feature-complete "Knowledge Layer" with extensive community support, Neo4j remains the industry standard for grounding agentic AI.

Navigating the Flow: Building a Connected Intelligence Stack for the Water Industry