Compile-time Foreign Key Checking

Many data integration tasks require creating multiple tables connected by foreign keys. Typically this process requires disabling foreign key constraints, loading data, and then enabling foreign key constraints, which triggers an expensive runtime check that the foriegn key constraints hold and defers detection of foreign key related bugs until runtime. In AQL, queries that target schemas containing foreign keys are guaranteed, at compile time, and without accessing any source data, to always materialize target instances with correctly populated foreign keys. As a result, foreign key related errors are detected much earlier in the ETL process and without costly runtime-checking.

This example (built in to the IDE with name Foreign Keys) defines a source schema about departments, professors, and students, and a query to find all possible same-department professor-student advising matches. The target schema relates matches, professors, students, and departments through four foreign keys and an equation enforcing that in every match, the department the professor works in is the same as the department the student is majoring in. A query that incorrectly populates the target schema is shown to be rejected at compile time.

We start by defining a source schema for departments, professors, and students, with foreign keys specifying the department which each professor works in and which each student is majoring in. For brevity we use an empty typeside.

typeside Ty = empty schema University = literal : Ty { entities Professor Student Department foreign_keys worksIn : Professor -> Department majoringIn : Student -> Department }


Here is some sample data:

instance UniversityOfX = literal : University { generators Gauss Church Euler : Professor Riemann Turing Kleene : Student math cs : Department multi_equations worksIn -> {Gauss math, Church cs, Euler math} majoringIn -> {Riemann math, Turing cs, Kleene cs} }

fk1

Our goal is to find all professor-student pairs where the professor and student share the same department. The target schema contains an additional entitiy for these matches, along with foreign keys to the professor and student involved, and an equation stating the that professor and student share the same department:

schema AdvisorMatches = literal : Ty { imports University entities Match foreign_keys studentOf : Match -> Student professorOf : Match -> Professor path_equations studentOf.majoringIn = professorOf.worksIn }


To populate the target schema we must give one sub query per target entity. The source tables are copied over directly. To populate the advisor matches, we iterate over all professors and students, dereferencing foreign keys (using the dot operator) instead of performing joins with Department. The where clause selects the professors and students which share the same department. Finally, for each foreign key in the target schema we must specify how it is to be populated. For example, the majoringIn foreign key takes students to departments; therefore, in the foreign keys clause for majoringIn, we must provide a department (dept), namely (stu.majoringIn), for each student (stu).

query findMatches = literal : University -> AdvisorMatches { entities Department -> {from dept:Department} Professor -> {from prof:Professor} Student -> {from stu:Student} Match -> {from p:Professor s:Student where p.worksIn = s.majoringIn} foreign_keys majoringIn -> {dept -> stu.majoringIn} worksIn -> {dept -> prof.worksIn} professorOf -> {prof -> p} studentOf -> {stu -> s} } instance MatchesForUnivX = eval findMatches UniversityOfX


The result is displayed in the IDE. Note that the generated IDs for the matches contain the provenance, or lineage, of how the match was formed.

fk2

If the sub query for Match does not correctly populate the target table, say because of a typo:

Match -> {from p:Professor s:Student where p.worksIn = p.worksIn}


The IDE rejects the query at compile time:

Error in query findMatches: Target equation v.studentOf.majoringIn = v.professorOf.worksIn not respected: transforms to s.majoringIn = p.worksIn which is not provable in the sub-query for Match.


A screen shot of the entire development is shown below:

fk3