A better way to integrate data
Whenever information from different sources needs to be combined, the data structures supporting that information must first be related. This task, called data integration, is the biggest and most expensive challenge in IT today, accounting for over 40% of enterprise IT budgets.
Our technology performs data-integration tasks — such as querying, combining, and evolving databases — using category theory, a branch of mathematics that has already revolutionized several areas of computer science. Category theory gives us the theoretical guidance missing from current-generation data models (Relational, RDF/OWL, Graph, Key-Value, LINQ) and we have used it to build software for integrating data more quickly and more accurately than existing tools.
Our product consists of two parts:
AQL: an algebraic query language
The result of five years of research at MIT, AQL supports all of the operations needed to integrate and query data using category-theoretic principles. Its key benefits are:
- Higher-quality integrated data and reduced risk of failure. AQL contains an embedded theorem prover that guarantees the correctness of AQL programs. For example, an AQL program cannot materialize an instance that violates a data integrity constraint. Such errors are detected at compile time, when they easiest to fix, and detecting them does not require any input data.
- Higher developer productivity and increased expressiveness. AQL generalizes concepts from SQL using powerful principles from category theory. For example, AQL generalizes SQL's select-from-where queries from single tables to many tables related by foreign keys. Such higher-level abstractions enable developers to be more productive.
AQL's key features are:
- Flexible I/O: data can be imported into and exported out of AQL by JDBC and CSV.
- Visualization: AQL schemas, databases, etc. can be displayed graphically.
- Rich data integrity constraints: AQL schemas contain entities, attributes, and foreign keys - as well as equations between them. One use of equations is for denormalization without the need to manually enforce the consistency of redundant data.
- More principled than SQL: relational concepts such as foreign keys re-appear, in a more principled form, in AQL. And AQL provides primitives that SQL lacks.
Categorical Informatics was spun out of the MIT Mathematics Department in the fall of 2015 and is supported by SBIR grants from the National Institute for Standards and Technology (NIST) and I-Corp grants from the National Science Foundation (NSF).