### Judgemental Equality, Homotopy Type Theory and Supercompilation

Ok, time for some incredibly preliminary thoughts on judgemental equality and what it means for Supercompilation and what supercompilation in turn means for judgemental equality.

The more that I explore equality the more incredibly strange the role it plays in type theory seems to be.  There seems intuitively to be nothing more simple than imagining two things to be identical, and yet it seems to be an endless source of complexity.

This story starts with my dissertation, but I'm going to start at the end and work backwards towards the beginning.  Currently I've been reading Homotopy Type Theory, now of some niche fame.  I put off reading about it for a long time because I generally don't like getting too bogged down in foundational principles or very abstract connections with other branches of mathematics (except perhaps on Sundays).  Being more of an engineer than a mathematician, I like my types to work for me, and dislike it when I do too much work for them.

However, the book is trendy, and friends were asking me questions about it, so I picked it up and started working through it.  I haven't gotten very far but I've already had a greater clarity about equality and what it means in reference to computation.

In type theory we start with a couple of notions of equality.  I'm going to stick with judgemental equality first.  Judgmental equality plays a crucial role in type theory as it is really what drives our notion of computation.  If we want to know whether two judgements are judgementally equal, we want to know if they reduce to the same thing.  Generally this is done with a number of reduction rules which give us normalisation to the same term.  The reduction rules are the key element of computation.  The simplest example is cut-elimination for the type A → B.

(λ x : S. t) s ─β→ t [x ↦ s]  (replace all x in t with s)

corresponds with removing implication elimination (or modus ponens):

(J1)        (J2)
⋮          ⋮                                    (J1 using J2 as leaves)
S → T      S                                           ⋮
──────────────     ─β→         T
T

Or in a more explicit type theoretic fashion we can write:

(J1)                                                         (J2)
⋮                                                           ⋮
Γ , x : S ⊢ t : T                                           ⋮
─────────────────                    ⋮
Γ ⊢ (λ x : S. t) : S → T                          Γ ⊢ s : S
────────────────────────────
Γ ⊢ (λ x : S. t) s : T

(J1) [x ↦ s]
⋮
─β→    ────────────────────
Γ ⊢ t [x ↦ s] : T

In somewhat operational terms we have taken a proof, and we have removed some of the branching structure, and replaced our right branch with any mention of a variable at the leaves of J1.  We can say that in some sense these two proofs are the same, and this is essentially where the computation happens in type theory.  We are engaging in a form of proof normalisation - when you run a programme, you're really just trying to find the simplest (cut free?) proof of the proposition given the data.

Now, things get interesting when you don't have all the data.  Then proofs which are intuitively very similar, end up being fairly distinct just because you can't drive the computation effectively using the reduction rules given for cut elimination.

A straight-forward example of this can be given by the associative property of addition.  First, let's give a program so we can constrain our problem to the bureaucracy of our syntax, such that we can later try to unshackle ourselves from it.

data ℕ : Set where
zero : ℕ
suc : ℕ → ℕ

_+_ :  ℕ → ℕ → ℕ
zero + y = y
suc x + y = suc (x + y)

This program in Agda gives us natural numbers formalised as being either 0, or some number +1 (suc), from which we can build all the finite numbers, along with a way to perform addition.

Given this definition we might wonder whether  (x + y + z = x + (y + z)) is true?  Well, if = is judgemental equality, we're in for a rough ride, because it definitely will not be.  Now, things aren't so bad, given a notion of propositional equality we can go ahead and make a similar proof work.  In Agda we might write:

infix 4 _≡_
data _≡_ {A : Set} : A → A → Set where
refl : (x : A) → x ≡ x

cong : ∀ {A B : Set} {a b : A} → (f : A → B) → a ≡ b → f a ≡ f b
cong f (refl x) = refl (f x)

associative : ∀ (n m o : ℕ) →  n + (m + o) ≡ (n + m) + o
associative zero m o = refl (m + o)
associative (suc x) m o = cong suc (associative x m o)

This proof makes reference to a second kind of equality, a propositional equality _≡_ which is "inside" the proof system.  We have simply represented it as a data type.   By showing two things are equal in this world, we've shown that there is a way to decompose arguments until we have a judgemental equality.

However, if we had some way to make it a judgemental equality, then we would hardly have had to have two cases!  Instead we could immediately just give the proof refl, since they would both reduce to the same thing.  But is this type of reduction possible?   Certainly not in the general case.  However, there is a large lurking universe in which it could be done.

Enter supercompilation!  The sorts of reductions that supercompilers give you will in fact reduce the associative addition statement to identical proofs.  This means that the normal form will be identical on both sides, and we have a judgemental equality.

But what is going on?  First, I think we need to think about cyclic proofs for a minute.   When we use the function _+_, we are actually introducing a cyclic proof.   We don't know how big the  proof is going to be until we get the data.  Instead we have a stand in cycle that represents the way that we will unfold the proof as more data comes in.

The reason we were blocked before is because recursive functions are uncooperative.  What they actually mean to say is that they are going to unfold themselves again to produce new proof leaves as long as you keep feeding them data.   The unfolding leads to new possible cuts to eliminate; more reductions using our rewriting rules.  However they will never cooperate with their neighbour recursive function in trying to produce a new cycle which produces the same infinitely unfolded proof.  They essentially limit the kinds of cycles that can exist in the proof tree to the ones that existed when you wrote down the definitions.

A Supercompiler's fundamental trick is in allowing us to abstract away the function names and just look at the proof as it unfolds creating cycles as we see fit (subject to similar rules that total functions must abide you can't have cycles unless something is always decreasing).  This means that we can create a cycle in the proof, (and perhaps a new named function) which has the same infinite unfolding of the proof tree.

This expansion of judgemental equality can make more proofs easier.  However, it can also make more proofs possible - since it is sometimes the case that we can take a program whose cycles look dubious, and turn them into cycles which look perfectly fine, since they now respect some simple decreasing or increasing order.  This was the subject of my dissertation, but unfortunately I don't think I knew enough type theory at the time to make the story convincing.

There are a lot of ideas that follow on from this.  The ideas of homotopy type theory think a lot about what kinds of proofs can identify two elements in propositional equality.  With a broader notion of judgemental equality, its interesting to wonder how the structure of identifications would change...

### Decidable Equality in Agda

So I've been playing with typing various things in System-F which previously I had left with auxiliary well-formedness conditions. This includes substitutions and contexts, both of which are interesting to have well typed versions of. Since I've been learning Agda, it seemed sensible to carry out this work in that language, as there is nothing like a problem to help you learn a language.

In the course of proving properties, I ran into the age old problem of showing that equivalence is decidable between two objects. In this particular case, I need to be able to show the decidability of equality over types in System F in order to have formation rules for variable contexts. We'd like a context Γ to have (x:A) only if (x:B) does not occur in Γ when (A ≠ B). For us to have statements about whether two types are equal or not, we're going to need to be able to decide if that's true using a terminating procedure.

And so we arrive at our story. In Coq, equality is som…

### Formalisation of Tables in a Dependent Language

I've had an idea kicking about in my head for a while of making query plans explicit in SQL in such a way that one can be assured that the query plan corresponds to the SQL statement desired. The idea is something like a Curry-Howard in a relational setting. One could infer the plan from the SQL, the SQL from the plan, or do a sort of "type-checking" to make sure that the plan corresponds to the SQL.

The devil is always in the details however. When I started looking at the primitives that I would need, it turns out that the low level table joining operations are actually not that far from primitive SQL statement themselves. I decided to go ahead and formalise some of what would be necessary in Agda in order get a better feel for the types of objects I would need and the laws which would be required to demonstrate that a plan corresponded with a statement.

Dependent types are very powerful and give you plenty of rope to hang yourself. It's always something of…

### Plotkin, the LGG and the MGU

Legend has it that a million years ago Plotkin was talking to his professor Popplestone, who said that unification (finding the most general unifier or the MGU) might have an interesting dual, and that Plotkin should find it. It turns out that the dual *is* interesting and it is known as the Least General Generalisation (LGG). Plotkin apparently described both the LGG for terms, and for clauses. I say apparently because I can't find his paper on-line.

The LGG for clauses is more complicated so we'll get back to it after we look at the LGG of terms. We can see how the MGU is related to the LGG by looking at a couple of examples and the above image. We use the prolog convention that function symbols start with lower case, and variables start with uppercase. The image above is organised as a DAG (Directed Acyclic Graph). DAGs are a very important structure in mathematics since DAGs are lattices.

Essentially what we have done is drawn an (incomplete) Hasse diagram f…