Tuesday, 10 March 2015

Judgemental Equality, Homotopy Type Theory and Supercompilation

Ok, time for some incredibly preliminary thoughts on judgemental equality and what it means for Supercompilation and what supercompilation in turn means for judgemental equality.

The more that I explore equality the more incredibly strange the role it plays in type theory seems to be.  There seems intuitively to be nothing more simple than imagining two things to be identical, and yet it seems to be an endless source of complexity.

This story starts with my dissertation, but I'm going to start at the end and work backwards towards the beginning.  Currently I've been reading Homotopy Type Theory, now of some niche fame.  I put off reading about it for a long time because I generally don't like getting too bogged down in foundational principles or very abstract connections with other branches of mathematics (except perhaps on Sundays).  Being more of an engineer than a mathematician, I like my types to work for me, and dislike it when I do too much work for them.

However, the book is trendy, and friends were asking me questions about it, so I picked it up and started working through it.  I haven't gotten very far but I've already had a greater clarity about equality and what it means in reference to computation.

In type theory we start with a couple of notions of equality.  I'm going to stick with judgemental equality first.  Judgmental equality plays a crucial role in type theory as it is really what drives our notion of computation.  If we want to know whether two judgements are judgementally equal, we want to know if they reduce to the same thing.  Generally this is done with a number of reduction rules which give us normalisation to the same term.  The reduction rules are the key element of computation.  The simplest example is cut-elimination for the type A → B.

(λ x : S. t) s ─β→ t [x ↦ s]  (replace all x in t with s)

corresponds with removing implication elimination (or modus ponens):

(J1)        (J2)
 ⋮          ⋮                                    (J1 using J2 as leaves)
S → T      S                                           ⋮
──────────────     ─β→         T
      T


Or in a more explicit type theoretic fashion we can write:


   (J1)                                                         (J2)
     ⋮                                                           ⋮
Γ , x : S ⊢ t : T                                           ⋮
─────────────────                    ⋮                 
Γ ⊢ (λ x : S. t) : S → T                          Γ ⊢ s : S  
────────────────────────────    
        Γ ⊢ (λ x : S. t) s : T                              


                               (J1) [x ↦ s]
                                 ⋮
─β→    ────────────────────
                      Γ ⊢ t [x ↦ s] : T


In somewhat operational terms we have taken a proof, and we have removed some of the branching structure, and replaced our right branch with any mention of a variable at the leaves of J1.  We can say that in some sense these two proofs are the same, and this is essentially where the computation happens in type theory.  We are engaging in a form of proof normalisation - when you run a programme, you're really just trying to find the simplest (cut free?) proof of the proposition given the data.

Now, things get interesting when you don't have all the data.  Then proofs which are intuitively very similar, end up being fairly distinct just because you can't drive the computation effectively using the reduction rules given for cut elimination.

A straight-forward example of this can be given by the associative property of addition.  First, let's give a program so we can constrain our problem to the bureaucracy of our syntax, such that we can later try to unshackle ourselves from it.

data ℕ : Set where
  zero : ℕ
  suc : ℕ → ℕ

_+_ :  ℕ → ℕ → ℕ
zero + y = y
suc x + y = suc (x + y)

This program in Agda gives us natural numbers formalised as being either 0, or some number +1 (suc), from which we can build all the finite numbers, along with a way to perform addition.

Given this definition we might wonder whether  (x + y + z = x + (y + z)) is true?  Well, if = is judgemental equality, we're in for a rough ride, because it definitely will not be.  Now, things aren't so bad, given a notion of propositional equality we can go ahead and make a similar proof work.  In Agda we might write:

infix 4 _≡_
data _≡_ {A : Set} : A → A → Set where
  refl : (x : A) → x ≡ x

cong : ∀ {A B : Set} {a b : A} → (f : A → B) → a ≡ b → f a ≡ f b
cong f (refl x) = refl (f x)

associative : ∀ (n m o : ℕ) →  n + (m + o) ≡ (n + m) + o
associative zero m o = refl (m + o)
associative (suc x) m o = cong suc (associative x m o)

This proof makes reference to a second kind of equality, a propositional equality _≡_ which is "inside" the proof system.  We have simply represented it as a data type.   By showing two things are equal in this world, we've shown that there is a way to decompose arguments until we have a judgemental equality.

However, if we had some way to make it a judgemental equality, then we would hardly have had to have two cases!  Instead we could immediately just give the proof refl, since they would both reduce to the same thing.  But is this type of reduction possible?   Certainly not in the general case.  However, there is a large lurking universe in which it could be done.

Enter supercompilation!  The sorts of reductions that supercompilers give you will in fact reduce the associative addition statement to identical proofs.  This means that the normal form will be identical on both sides, and we have a judgemental equality.

But what is going on?  First, I think we need to think about cyclic proofs for a minute.   When we use the function _+_, we are actually introducing a cyclic proof.   We don't know how big the  proof is going to be until we get the data.  Instead we have a stand in cycle that represents the way that we will unfold the proof as more data comes in.

The reason we were blocked before is because recursive functions are uncooperative.  What they actually mean to say is that they are going to unfold themselves again to produce new proof leaves as long as you keep feeding them data.   The unfolding leads to new possible cuts to eliminate; more reductions using our rewriting rules.  However they will never cooperate with their neighbour recursive function in trying to produce a new cycle which produces the same infinitely unfolded proof.  They essentially limit the kinds of cycles that can exist in the proof tree to the ones that existed when you wrote down the definitions.

A Supercompiler's fundamental trick is in allowing us to abstract away the function names and just look at the proof as it unfolds creating cycles as we see fit (subject to similar rules that total functions must abide you can't have cycles unless something is always decreasing).  This means that we can create a cycle in the proof, (and perhaps a new named function) which has the same infinite unfolding of the proof tree.

This expansion of judgemental equality can make more proofs easier.  However, it can also make more proofs possible - since it is sometimes the case that we can take a program whose cycles look dubious, and turn them into cycles which look perfectly fine, since they now respect some simple decreasing or increasing order.  This was the subject of my dissertation, but unfortunately I don't think I knew enough type theory at the time to make the story convincing.

There are a lot of ideas that follow on from this.  The ideas of homotopy type theory think a lot about what kinds of proofs can identify two elements in propositional equality.  With a broader notion of judgemental equality, its interesting to wonder how the structure of identifications would change...


No comments:

Post a Comment