Skip to main content

Formalisation of Tables in a Dependent Language




I've had an idea kicking about in my head for a while of making query plans explicit in SQL in such a way that one can be assured that the query plan corresponds to the SQL statement desired. The idea is something like a Curry-Howard in a relational setting. One could infer the plan from the SQL, the SQL from the plan, or do a sort of "type-checking" to make sure that the plan corresponds to the SQL.

The devil is always in the details however. When I started looking at the primitives that I would need, it turns out that the low level table joining operations are actually not that far from primitive SQL statement themselves. I decided to go ahead and formalise some of what would be necessary in Agda in order get a better feel for the types of objects I would need and the laws which would be required to demonstrate that a plan corresponded with a statement.

Dependent types are very powerful and give you plenty of rope to hang yourself. It's always something of a struggle for me to figure out what I want to live in the type world as opposed to what I want to live in the term world.

First things first. We're going to need rows if we want tables. In dynamically typed languages (a.k.a. languages with a [potentially open] universal type) you can easily make a heterogeneous collection of objects (numbers, strings etc). Modelling a row in lisp or python is very straight forward. However, in a strongly typed setting heterogeneous lists are more complicated. The solution I have written in Agda below uses a vector (really just a list with a guaranteed size) of types which lives in the type universe, and which is constructed as we push things into our row. This is basically just a heterogeneous list where the type advertises what kind of row it is.

module RowList where

open import Level renaming (suc to lsuc; zero to lzero)
open import Data.Nat hiding (__)
open import Data.Vec renaming (__ to __; _++_ to _++_; [] to []; [_] to [_])
open import Data.List 
open import Data.Fin renaming (_+_ to _+f_)
open import Data.Nat.Properties
open import Relation.Binary.PropositionalEquality as PropEq
  using (__; refl; sym; subst; trans; cong; cong₂; inspect)
open import Data.Bool

infixr 4 _,_
infixr 5 __

data Row {α : Level} : (n : )  Vec (Set α) n  Set (lsuc α) where 
  · : Row zero []
  _,_ :  {j} {A : Set α} {Aⱼ : Vec (Set α) j}  A  Row j Aⱼ -> Row (suc j) (A  Aⱼ)

In the data definition above we use (·) to terminate a row, and return a row which has an empty-vector in the rows type signature. In the
(,) case, we extend the row by giving an element of some type A and stick its type in our row signature vector. This will allow us to ensure that tables are constructed of homogeneous row types, and will allows us to present a lookup function for rows which knows what its return type will be.

You may notice all the bits about (Set α) etc. The α is just book keeping to avoid Russell-type paradoxes creeping in. They are otherwise not particularly enlightening and the above table definition might be more easily read if you ignore them.

example : Row 2 (  (  []))
example = (1 , 2 , ·)

pick :  {α i j A}  (v : Vec (Fin j) i)  Vec {α} A j  Vec A i
pick [] ys = []
pick (x  xs) ys = lookup x ys  pick xs ys 

__ :  {α j Aⱼ}  (r : Row {α} j Aⱼ)  (i : Fin j)  lookup i Aⱼ 
·  ()
(y , ys)  zero = y
(y , ys)  suc i = ys  i 

We call our row selection operation (r i) for a row (r) and index i. Notice that the return type is dependent on the type signature vector which is carried around in the type of the row. This selector will be useful for writing predicates over columns which will have to know what their input type is. We use a finite type (Fin j) to describe our index, since we need to make sure we don't index outside of the row. The finite type ensures us that whatever the index is, it's no greater than the size of the vector. These operations are now guaranteed to be type safe and can not lead to run-time out-of-range errors.

_++r_ :  {α i j Aᵢ Bⱼ}  Row {α} i Aᵢ  Row j Bⱼ  Row (i + j) (Aᵢ ++ Bⱼ)
· ++r r = r
(x , xs) ++r r = x , (xs ++r r)

We of course need to be able to combine rows from different tables in joins, and so we'll need a row concatenation operator, which we call (++r).

pickr :  {α i j Aⱼ}  (v : Vec (Fin j) i)  Row {α} j Aⱼ  Row i (pick v Aⱼ)
pickr [] r = · 
pickr (x  xs) r = r  x , pickr xs r

Additionally, it's useful if we can specify a list of indices we want to pick out of a row to construct a new row. This will allow us to make a "select" like statement where we can duplicate or ignore indices in order to create a new row, which when done over a list of rows will become more like the familiar table select.

Table :  {α} (m : ) (Aⱼ : Vec (Set α) m)  Set (lsuc α)
Table {α} m Aⱼ = List (Row m Aⱼ)

We define tables as simply a list of rows, and keep the indices and types as parameters to the type. The table header is now correct by construction and exists only as a type index.

{-
Example table: The header is specified as the type
-}
exampleTable : Table 3 (  (  (  [])))
exampleTable = (1 , 2 , 3 , ·)
             ((3 , 4 , 5 , ·)
             ((5 , 6 , 7 , ·)
             []))


__ :  {α i j Aᵢ Bⱼ}  Table {α} i Aᵢ  Table j Bⱼ  Table (i + j) (Aᵢ ++ Bⱼ)
[]  [] = []
[]  _  = []
_   [] = []
(x  xs)  (x'  xs') = x ++r x'  [ x ]  xs' ++ xs  (x'  xs') 
-- concatenate row, union row with the rest of the table

The familiar join operation. As you can see here, our primitive join operation does not take a restriction. In practice the primitive join would have to use the restriction in order to reduce the amount of row consing. In order to have a pragmatic optimiser, I'll have to change this later.

{-
Example join
-}
result : Table 6 (  (  (  (  (  (  []))))))
result = exampleTable  exampleTable
 

-- Restriction 
__ :  {α j Aⱼ}  Table {α} j Aⱼ   (Row j Aⱼ  Bool)  Table j Aⱼ
[]  p = []
(x  xs)  p = if p x then x  xs  p else xs  p 

Restrictions are just row predicates. They simply take a row and specify which rows to keep and which to throw away. This allows for most of the usual SQL WHERE clause shenanigans (minus aggregation of course) and gives us a fairly complete method of doing joins subject to restrictions [we don't deal with nulls at all].

{- 
Example restriction
-}
_b_ :  (n m : )  Bool
zero b m = true
suc n b zero = false
suc n b suc n' = n b n'

restrict : Table 6 (  (  (  (  (  (  []))))))
restrict = result  (λ v  (v  (# 1)) b 1)
Finally we have restriction, which allows us to pick and choose columns by specifying their index. It would of course be more natural to have names for columns and allow that as a means of addressing the columns of interest and I may in fact add that to rows in order to simplify the debugging, but it's not of much interest logically speaking.
-- Selection 
__ :  {α i j Aⱼ}  (v : Vec (Fin j) i)  Table {α} j Aⱼ  Table i (pick v Aⱼ)
xs  [] = []
xs  (y  ys) = (pickr xs y)  xs  ys

{- 
Example selection 
-}
-- Syntax of a query: (selector ∥ t1 ∪ t2 ∪ t3 ▹ restriction)


selection : Table 2 (  (  []))
selection = ((# 1)  ((# 3)  []))  result  (λ v  (v  (# 1)) b 4)

From this point we will need some form of relation which allows us to establish query equivalence. We will then want a way to compute whether two tables are equivalent up to this relation and we can continue on with the more pragmatic concerns of pushing the restrictions into the actual table joins.

Comments

Popular posts from this blog

Decidable Equality in Agda

So I've been playing with typing various things in System-F which previously I had left with auxiliary well-formedness conditions. This includes substitutions and contexts, both of which are interesting to have well typed versions of. Since I've been learning Agda, it seemed sensible to carry out this work in that language, as there is nothing like a problem to help you learn a language. In the course of proving properties, I ran into the age old problem of showing that equivalence is decidable between two objects. In this particular case, I need to be able to show the decidability of equality over types in System F in order to have formation rules for variable contexts. We'd like a context Γ to have (x:A) only if (x:B) does not occur in Γ when (A ≠ B). For us to have statements about whether two types are equal or not, we're going to need to be able to decide if that's true using a terminating procedure. And so we arrive at our story. In Coq, equality is ...

Teagrey

I was ironing my shirt today, which I almost never do. Because of this I don't have an ironing board so I tried to make a make-shift ironing board on my floor using a towel and some books. I grabbed the heaviest books on the nearest shelf, which happened to be Organic Chemistry, Stalingrad and an annotated study bible containing the old and new testament. As I pulled out the bible, a flower fell out which had been there for over 17 years now. I know that because it was put there by my first wife, Daniel, who killed herself in April about 17 years ago. It fell from Thessalonians to which it had been opened partially. Highlighted was the passage: "Ye are all sons of the light and sons of the day." I guess the passage gave her solace. Daniel was a complicated woman. She had serious mental health issues which plagued her for her entire life. None of them were her fault. She was dealt an absolutely awful hand in life, some truly nasty cards. She had some considerable c...

Total Functional Programming

Recently on Lambda the Ultimate there was a post called "Total Functional Programming".  The title didn't catch my eye particularly, but I tend to read the abstract of any paper that is posted there that doesn't sound terribly boring.  I've found that this is a fairly good strategy since I tend to get very few false positives this way and I'm too busy for false negatives. The paper touches directly and indirectly on subjects I've posted about here before.  The idea is basically to eschew the current dogma that programming languages should be Turing-complete, and run with the alternative to the end of supplying "Total Functional Programming" .  At first glance this might seem to be a paper aimed at "hair-shirt" wearing functional programming weenies.  My reading was somewhat different. Most hobbiest mathematicians have some passing familiarity with "Turing's Halting Problem" and the related "Goedel's Inco...