A better way to integrate data

Whenever information from different sources needs to be combined, the data structures supporting that information must first be related. This task, called data integration, is the biggest and most expensive challenge in IT today, accounting for over 40% of enterprise IT budgets.

Our technology performs data-integration tasks — such as querying, combining, and evolving databases — using category theory, a branch of mathematics that has already revolutionized several areas of computer science. Category theory gives us the theoretical guidance missing from current-generation data models (Relational, RDF/OWL, Graph, Key-Value, LINQ) and we have used it to build software for integrating data more quickly and more accurately than existing tools.

Our product consists of two parts:

AQL: an algebraic query language

The result of five years of research at MIT, AQL supports all of the operations needed to integrate and query data using category-theoretic principles. Its key features are:

  • Flexible I/O: data can be imported into and exported out of AQL by JDBC and CSV.
  • Visualization: AQL schemas, databases, and mappings between them can be displayed graphically.
  • Computational schemas: user-defined functions are part of AQL schemas and can be specified either using java's built-in javascript (Nashorn) or as symbolic equations.
  • Rich data integrity constraints: AQL schemas contain entities, attributes, and foreign keys - as well as equations between them. One use of equations is for denormalization without the need to manually enforce the consistency of redundant data.
  • SQL-like: relational concepts such as select/from/where queries and schema mappings re-appear, in a generalized form, in AQL.
  • More principled than SQL: relational concepts such as sets/bags, nulls and foreign keys re-appear, in a more principled form, in AQL.
  • Database-at-a-time operations: AQL operations construct entire databases at once, rather than individual tables.
  • Correctness: sophisticated automated theorem proving techniques ensure, at compile time, that AQL programs cannot fail at runtime or materialize instances that do not obey data integrity constraints.
  • A re-usable warehouse design pattern: A wizard walks users through the processes of creating a data warehouse using AQL.
  • 100% pure java

The open-source community website hosts screen shots, allows users to try AQL in the browser, and contains many papers, slides, and recorded lectures, as well as a google discussion group.


Categorical Informatics was spun out of the MIT Mathematics Department in the fall of 2015 and is supported by SBIR grants from the National Institute for Standards and Technology (NIST) and I-Corp grants from the National Science Foundation (NSF).