Saturday, 2 July 2016

Some thoughts on the meaning of equality and cyclic proof - can CTT help were ITT hinders?

The CTT (Computational Type Theory) approach to type theory, as represented in NuPRL and described in [1][2][3], provides a contrasting methodology to what appears to be in vogue at the moment - an approach known as ITT (Intensional Type Theory) as represented in modern type-theory based proof assistants such as Coq and Agda.

There is a very good exposition by Jon Sterling which describes the more fundamental philosophical differences here [4]. It's somewhat difficult to convey the subtleties without such a complete exposition, but since its precisely this that I want to talk about, I'll try to recapitulate the essential differences.

When we talk about the elements of a type in ITT, we give the process of their construction in such a way that the elements which inhabit the type are made entirely explicit in the definition of the type.  So, for instance, in defining the data type for the natural numbers we give zero and succ explicitly as the elements of this type.  The type relation (n : ℕ) is then not an abstract description of what would fit into this type, but a description of the syntax which allows valid types to be formed.

data  : Set where
  zero : 
  succ :   

However, there is more to the story as well. As I mentioned in a previous blogpost here, the meaning of computation and equality in ITT is both intertwined and somewhat hidden from view. We say in ITT something along the lines of: "we mean for two things to be equal when they are the same thing, and the same thing is understood to mean identification of proof-trees under the reduction relation"

But why do we allow identity of proofs according to this reduction and not others and how do we know which and under what circumstances. And where exactly do we define the way that the natural numbers are treated in terms of their behaviour? It turns out that the elimination rules are derived from the above datatype automatically, and then wrapped up nicely for us behind the scenes.

The approach in CTT does these two things very differently. When defining the type ℕ, we instead describe what would constitute evidence that we were a natural number, and what would constitute evidence of equality between two natural numbers. Here, we have to explicitly describe the big-step evaluation judgement which acts on our elements so we know what constitutes a canonical element of the type.

There are a number of ramifications of this approach. One outcome which is of particularly interest to me (due to my current work in RDF, of which I'll describe more later) is that subtyping can be much more direct as data can be a canonical element of more than one type. In ITT this is impossible as the syntax itself describes what constitutes a type.

Circularity and Identity

Which leads me to something which has made me uncomfortable for quite a long time regarding what constitutes a proof. Syntax, in ITT is designed (especially after elaboration) to be the proof. There is virtually no difference between the proof-tree and a fully elaborated proof term and if there is, it's only up to some decidable procedure which fills in gaps which the designer of the syntax didn't think was worth bothering with as it could be elaborated just-in-time. Identification of proofs is done under reduction relation. Unlike in CTT, two proofs can have precisely the same abstract shape, and yet be utterly distinct due to naming (of for instance constructors) alone.

This might not seem like it constitutes much of a problem, but when it comes to cyclic proof, it turns out to be fairly irritating. The reason is that we have decided apriori that there is no possible way to identify two proofs as they will never have the same syntax because induction itself comes equipped with names. Since each proof step is in fact a syntactic step, terms in our term language acts as descriptions of how to build the rest of the proof tree. These names will get all messed up when we start manipulating cyclic proofs and our terms stop corresponding to the naming of the rest of the proof.

To see this we can look at the dpu (double-plus-unbad) function, its proof tree, and another potential cyclic proof which we will call dpg (double-plus-good) - which is semantically equivalent. By semantically equivalent we mean that they cannot be distinguished within the theory.

In order to define dpu and dpg we first need to write down a simple addition function. First we'll show these functions as they would be presented in Agda.
_+_ :     
zero + m = m
succ n + m = succ (n + m) 

double-plus-unbad :       
double-plus-unbad x y z = (x + y) + z

double-plus-good :       
double-plus-good zero y z = y + z
double-plus-good (succ x) y z = succ (double-plus-good x y z)
We are going to use an imaginary programming language to express our cyclic proofs whose syntax should hopefully be clear enough to the reader. In particular, we steal Agda's pattern lambda syntax to describe case distinction on inductively defined types (λ {zero → M ; succ n → P}).

                                                                  n : ℕ , m : ℕ ⊢ n + m : ℕ  
                                       ──────────────var      ───────────────────────────────succ
                                      m : ℕ ⊢ m : ℕ            n : ℕ , m : ℕ ⊢ succ (n + m) : ℕ  
─────────────var ─────────────var  ──────────────────────────────────────────────────────────case(†)
n : ℕ ⊢ n : ℕ     m : ℕ ⊢ m : ℕ        n : ℕ , m : ℕ ⊢ λ { zero → m ; succ n → succ (n + m)} n
        n : ℕ , m : ℕ ⊢ (λ n m → λ { zero → m ; succ n → n + m} n) n m : ℕ 
                             n : ℕ , m : ℕ ⊢ n + m : ℕ

We here demonstrate a cyclic proof presentation[5]. Under usual conditions we would have instead made appeal to some least-fixedpoint operator or have put _+_ in a function-variable-context or the like. Additionally, in order to show that this construction was a well defined (total) function we would have cause to make an appeal to some termination argument. Here we have structural recursion on our side. We can reuse this argument from the usual setting without alteration.

In addition, the circularity is intentionally made more explicit in this first example by virtue of an over-clever naming strategy and the shadowing behaviour of lexical binding, but in real practice of cyclic proof (and in the examples below) there would need to be a substitution applied to obtain the original expression. We will leave this substitution implicit as its not important here (but it would be very important in a more formal description).

You also may have noticed that I leave out a proof rule for the scrutinee of the "case"-rule in these proofs as it should be apparent from context (elaboration here should be performed by the reader).

But why should we want to display this proof as a cyclic proof instead of the more usual least fixed-point approach which is less obscure? Hopefully this will become more apparent with the inclusion of our dpu / dpg functions. We give the dpu function's cyclic proof here:

────────────────────────(*) ─────────────var  ────────────────────────────────────────────────────(†)
x : ℕ , y : ℕ ⊢ x + y : ℕ     z : ℕ ⊢ z : ℕ     n : ℕ , m : ℕ ⊢ λ { zero → m ; succ n → succ (n + m)} n
     x : ℕ , y : ℕ , z : ℕ ⊢ (λ n m → λ { zero → m ; succ n → succ (n + m)} n)) (x + y) z : ℕ 
                               x : ℕ , y : ℕ , z : ℕ ⊢ (x + y) + z : ℕ
                             x : ℕ , y : ℕ , z : ℕ ⊢ dpu x y z : ℕ 

In this proof we make reference to two different entry points into the proof of plus. These correspond with the two different uses of plus which are syntatically apparent in the term, so it is not surprising that there should be two different cyclic references. However, in dpg, we only refer to _+_ once.
                                                     x : ℕ , y : ℕ , z : ℕ ⊢ dpg x y z : ℕ 
                    ───────────────────────(*) ──────────────────────────────────────────────────succ
                    y : ℕ , z ∶ ℕ ⊢ y + z : ℕ   x : ℕ , y : ℕ , z : ℕ ⊢ succ x → succ (dpg x y z) : ℕ 
───────────────────var   x : ℕ , y : ℕ , z : ℕ ⊢ λ { zero → y + z                                        
x,y,z : ℕ ⊢ x,y,z : ℕ                                 ; succ x → succ (dpg x y z)} x : ℕ 
  x : ℕ , y : ℕ , z : ℕ ⊢ (λ x y z → λ { zero → y + z
                                         ; succ x → dpg x y z} x) x y z : ℕ
   ──────────────────────────────────────────────────────────────────────unfold (§)
                           x : ℕ , y : ℕ , z : ℕ ⊢ dpg x y z : ℕ

This dpg, I claim, is semantically equivalent to dpu. That is, it always gives the same result, but does so in one pass instead of two. However, not only is it semantically equivalent, but it is derivable according to a number of manipulations of a cyclic proof tree. The term dpg is the term which would be arrived at by carrying out a process of super-compilation [6].

To see this, let's watch how dpu would "unfold". In order to do this properly we are going to need an extended reduction relation. Let us extend reduction with "case-distribution" which is given in the following rule.
(λ { zero → M ; succ x → N }) ((λ {zero → P ; succ x' → Q}) R)
((λ {zero → (λ { zero → M ; succ x → N }) P ; succ x' → (λ { zero → M ; succ x → N }) Q}) R)

This yields semantically equivalent terms for any closed terms, even in the case of non-termination.
─────────────────────────────────────(†)            n : ℕ , y : ℕ , z : ℕ ⊢ (n + y) + z : ℕ 
n : ℕ , m : ℕ ⊢ λ { zero → m                      ───────────────────────────────────────────succ
                     ; succ n → succ (n + m)} n    n : ℕ , y : ℕ , z : ℕ ⊢ succ ((n + y) + z) : ℕ
x : ℕ , y : ℕ , z : ℕ ⊢ λ { zero → (λ { zero → z
                                        ; succ n → succ (n + z)} y)
                            ; succ n → succ ((n + y) + z)} x : ℕ 
x : ℕ , y : ℕ , z : ℕ ⊢ λ { zero → (λ { zero → z
                                        ; succ n → succ (n + z)} y)
                            ; succ n → (λ { zero → z
                                          ; succ n → succ (n + z)} (succ (n + y)))} x : ℕ 
x : ℕ , y : ℕ , z : ℕ ⊢ λ { zero → z
                            ; succ n → succ (n + z)} (λ { zero → y
                                                         ; succ n → succ (n + y)} x) : ℕ 
                x : ℕ , y : ℕ , z : ℕ ⊢ λ { zero → z
                                            ; succ n → succ (n + z)} (x + y)) : ℕ 
    x : ℕ , y : ℕ , z : ℕ ⊢ (λ n m → λ { zero → m ; succ n → n + m} n) (x + y) z : ℕ 
                           x : ℕ , y : ℕ , z : ℕ ⊢ (x + y) + z : ℕ
                             x : ℕ , y : ℕ , z : ℕ ⊢ dpu x y z : ℕ
Ok, so this isn't precisely the same, we return to (†) instead of (*) and there is an intermediate lambda application which is reduced (and could have been reduced in the original cyclic proof). However, modulo these few unobservable actions, they are the same, and they can be made precisely the same computationally, if we squint our eyes and ignore the inconsequential (↝)-rules.

The reductions that take place in our proof (the (↝)-rules) are "unobservable". This is because finite computations which take place without emitting constructors are not observable in type theory. What we have done is introduced a more flexible reduction relation, which always has the same semantics on closed terms, and then utilised this to fold the proof in a new way. Our folds are performed with simple substitutions of variables. It turns out that if you fold the proof in a new way using variable substition in such a way that the final term terminates, (or co-terminates)[7] you have absolute freedom in folding your proof and you can't go wrong. Here we've created a term which is structurally recursive and so we are safe from evil.

But now we have two strange things going on. We have the terms, which tell us what the rest of the proof tree is supposed to look like, meaning very little about what the rest of the proof tree is going to look like as they are supposed to do in ITT. The second thing is that we have cycles with totally different "names" at each step. That is, the terms at our cycles look pretty difficult to inter-relate and even when we have the same observable "edges" (such as succ) we have lots of intermediate cruft with apparently different terms. In fact, the only way to know whether one term name is like another is to see how it unfolds and refolds. But this is in general something which can't possibly have a decision procedure.

Should the fact that it doesn't have a general decision procedure stop us though? Perhaps there is a reasonable decision procedure that would recognise lots of "names" as being the same. Some process by which we could unfold a lot, but not too much that we don't terminate, and fold in a safe way. Since supercompilation terminates, this could give us our cycles, allow us to compare proofs and admit a larger class of equivalences. In fact, the interested reader might note that the above process will reduce x + (y + z) to essentially the same proof, giving us a proof of associativity for free.

But this leads us to further questions. When we started with the reduction relation in ITT we identified different proofs (such as (λ x. f x) y ≈ f y) but we consider this simple enough that we hope not to do anything dodgy. The rest of the proof we make really explicit with all of the steps written out for us. When we get to supercompilation we have something which is very complex indeed. How are we to trust the terms that come out the other side?

Well, it is possible to prove that case-↝ is a conservative extension of ↝. Similarly, it is conceivable that one could show that there is a semantic equivalence for folding, in the event that we can provide a termination argument.

In CTT we talk about canonical value elements by talking about their reduction behaviour under big-step evaluation. The above arguments feel morally similar and this syntactic cruft is noted in [7]. I'm not yet familiar enough with CTT to know precisely how to formulate things (or to what extent the same problems arise) in this setting. What is clear to me however, is that this argument is a very uncomfortable one to make in ITT without breaking the fundamental design principle which insists on the bureaucracy of syntax everywhere excepting in the original unalterable reduction scheme.

Is there a trick with which we can do this comfortably in ITT? Is there a cleaner way to explain these things in CTT?

EDIT: In general, we don't want proof irrelevance. We'd be quite unhappy in a computational setting to see our proofs of ℕ → ℕ → ℕ → ℕ being replaced arbitrarily (for instance, with one that returns zero in all cases). However, it seems to me that the our notion of proof relevance is sometimes too strong. A notion of proof relevance which would be more flexible would be one of observable equivalence - in which case we at least are assured that we compute to the same canonical forms in all closed contexts.

[1] Naïve Computational Type Theory, Robert L. Constable 2003
[2] The NuPRL Book, The NuPRL Group
[3] Constructing Type Systems over an Operational Semantics, Robert Harper, 1992
[4] Type Theory and its Meaning Explanations, Jon Sterling, 2015
[5] An Introduction to Cyclic Proof, James Brotherston
[6] Introduction to Supercompilation, Morten Heine, B. Sø rensen and Robert Glück
[7] Development of the Productive Forces
[8] Proof Term Assignments and Intuitionistic Truth, Jon Sterling

Friday, 13 March 2015

How to implement transactions for a triple store in Prolog

For the current research project I'm working on, we're using and RDF triple-store and need to be able to store a large numbers of triples, and assure that after any update a number of constraints are satisfied on the structure of the triples according to our ontology.

In order to implement inserts, updates and deletes, they need to be packaged up inside of a transaction that only commits if the constraints are satisfied in the final image of the triple store.  This means we need a way to do a post-condition driven transaction.

One easy way to implement this is to simply copy the entire database, run the insert/update/delete statements and see if we satisfy the transaction.  If we are changing enormous numbers of triples, it might indeed be the fastest and most reasonable approach.   If we are only changing one triple however, this seems a totally mad procedure.

There is another approach and I'll describe it as follows.

Let's assume that our constraint predicate 'constraint/0' is already defined.   What we want to do is keep two alternative layers over the actual triple store, one negative, and one positive.  We then carry out something along the lines of the painters algorithm.  How this works will become more obvious as we write our predicates.  If the constraint C is satisfied on the final version, we then "commit" by deleting everything from the negative graph, and insert everything from the positive graph.

 :- module(transactionGraph,[rdf1/4, insert/4, update/5, delete/4]).  
 :- use_module(library(semweb/rdf_db)).  
 % How to implement post-condition transactions  
 neg(A1,A2) :- atom_concat('neg-', A1, A2).   
 pos(A1,A2) :- atom_concat('pos-', A1, A2).  

What we are going to do is name two new graphs that are derived from our current graphs.  To do this properly we'd want to use a gensym/uuid (which most prologs provide).

 :- rdf_meta xrdf(r,r,t,?).
xrdf(X,Y,Z,G) :- (pos(G,GP), rdf_db:rdf(X,Y,Z,GP)    % If in positive graph, return results
                  *-> true                           % 
    ; (neg(G,GN), rdf_db:rdf(X,Y,Z,GN) % If it's not negative
              *-> false                       % If it *is* negative, fail
       ; rdf_db:rdf(X,Y,Z,G))).        % otherwise bind the rdf result.

This is the meat of the idea.   Our rdf triple store keeps triples in the predicate rdf/4, along with the name of the graph.  We are going to mimic this predicate with some additional logic.  You can safely ignore the 'rdf_meta' clause as it's nothing to do with the logic.

What we want is to check if our triple is in the positive graph.   If not we check to see if it is in the negative graph - negation will be seen as failure.  Notice, if it's in the negative graph we perform a cut using the ! symbol.  This means that we are not going to search any other clauses in rdf1/4 and this keeps us from using the underlying database.

 % you can safely ignore rdf_meta for understanding this programme  
 % it only affects namespace prefix handling.
:- rdf_meta insert(r,r,t,?).
insert(X,Y,Z,G) :- 
    % positive pos graph
    % retract from the negative graph, if it exists.
    (neg(G,G3), rdf_db:rdf(X,Y,Z,G3), rdf_retractall(X,Y,Z,G3) ; true).

Here we have our insert statement.  What we do is assert ourselves into the positive graph and make sure we retract any deletes which may have gone into the negative graph.

:- rdf_meta delete(r,r,t,?).
delete(X,Y,Z,G) :- 
    pos(G,G2), % delete from pos graph
delete(X,Y,Z,G) :- % assert negative
    neg(G, G2),
    % write('Asserting negative fact: '),
    % write([X,Y,Z,G2]),nl,

Here we have the delete statement, which is a little unusual.  First, we delete from the positive graph if there is anything there.  We then fail, after side-effecting the database, because we have to do something further, and we want to run another clause.

We then check to see if we are in the underlying graph, and if we are, we need to insert a negative statement to ensure that the triple from the underlying database will not be visible.

:- rdf_meta update(r,r,o,o,?).
update(X,Y,Z,G,_) :-  
    rdf_assert(X,Y,Z,G3), fail. % delete previous subject if it exists and try update
update(X,Y,Z,G,Action) :- 
    (rdf_db:rdf(X,Y,Z,G2) -> rdf_update(X,Y,Z,G2,Action) % exists in pos graph 
     ; new_triple(X,Y,Z,Action,X2,Y2,Z2),
       rdf_assert(X2,Y2,Z2,G2)). % doesn't yet exist in pos graph (insert). 

Finally, we get to the update.  Updates take an action to define which part of the triple they want to impact.  You can see the function symbol in new_triple/7.  This predicate has to remove the old value from the underlying graph by asserting a negative - it then 'fails', similarly to the delete statement in order to see if we need to update the triple in the positive graph.  If it is not in the positive graph, we need to insert it there as the old value has been negated from the underlying graph.

Now we can query rdf1/4 and we will get behaviour that mimics the actual inserts and updates happening on the underlying store.  We can run our constraints on this rdf1 instead of rdf.  If it succeeds we bulk delete from the negative graph and bulk insert from the positive graph.

Now, if you're clever you might see how this trick could be used to implement nested transactions with post conditions!

Tuesday, 10 March 2015

Judgemental Equality, Homotopy Type Theory and Supercompilation

Ok, time for some incredibly preliminary thoughts on judgemental equality and what it means for Supercompilation and what supercompilation in turn means for judgemental equality.

The more that I explore equality the more incredibly strange the role it plays in type theory seems to be.  There seems intuitively to be nothing more simple than imagining two things to be identical, and yet it seems to be an endless source of complexity.

This story starts with my dissertation, but I'm going to start at the end and work backwards towards the beginning.  Currently I've been reading Homotopy Type Theory, now of some niche fame.  I put off reading about it for a long time because I generally don't like getting too bogged down in foundational principles or very abstract connections with other branches of mathematics (except perhaps on Sundays).  Being more of an engineer than a mathematician, I like my types to work for me, and dislike it when I do too much work for them.

However, the book is trendy, and friends were asking me questions about it, so I picked it up and started working through it.  I haven't gotten very far but I've already had a greater clarity about equality and what it means in reference to computation.

In type theory we start with a couple of notions of equality.  I'm going to stick with judgemental equality first.  Judgmental equality plays a crucial role in type theory as it is really what drives our notion of computation.  If we want to know whether two judgements are judgementally equal, we want to know if they reduce to the same thing.  Generally this is done with a number of reduction rules which give us normalisation to the same term.  The reduction rules are the key element of computation.  The simplest example is cut-elimination for the type A → B.

(λ x : S. t) s ─β→ t [x ↦ s]  (replace all x in t with s)

corresponds with removing implication elimination (or modus ponens):

(J1)        (J2)
 ⋮          ⋮                                    (J1 using J2 as leaves)
S → T      S                                           ⋮
──────────────     ─β→         T

Or in a more explicit type theoretic fashion we can write:

   (J1)                                                         (J2)
     ⋮                                                           ⋮
Γ , x : S ⊢ t : T                                           ⋮
─────────────────                    ⋮                 
Γ ⊢ (λ x : S. t) : S → T                          Γ ⊢ s : S  
        Γ ⊢ (λ x : S. t) s : T                              

                               (J1) [x ↦ s]
─β→    ────────────────────
                      Γ ⊢ t [x ↦ s] : T

In somewhat operational terms we have taken a proof, and we have removed some of the branching structure, and replaced our right branch with any mention of a variable at the leaves of J1.  We can say that in some sense these two proofs are the same, and this is essentially where the computation happens in type theory.  We are engaging in a form of proof normalisation - when you run a programme, you're really just trying to find the simplest (cut free?) proof of the proposition given the data.

Now, things get interesting when you don't have all the data.  Then proofs which are intuitively very similar, end up being fairly distinct just because you can't drive the computation effectively using the reduction rules given for cut elimination.

A straight-forward example of this can be given by the associative property of addition.  First, let's give a program so we can constrain our problem to the bureaucracy of our syntax, such that we can later try to unshackle ourselves from it.

data ℕ : Set where
  zero : ℕ
  suc : ℕ → ℕ

_+_ :  ℕ → ℕ → ℕ
zero + y = y
suc x + y = suc (x + y)

This program in Agda gives us natural numbers formalised as being either 0, or some number +1 (suc), from which we can build all the finite numbers, along with a way to perform addition.

Given this definition we might wonder whether  (x + y + z = x + (y + z)) is true?  Well, if = is judgemental equality, we're in for a rough ride, because it definitely will not be.  Now, things aren't so bad, given a notion of propositional equality we can go ahead and make a similar proof work.  In Agda we might write:

infix 4 _≡_
data _≡_ {A : Set} : A → A → Set where
  refl : (x : A) → x ≡ x

cong : ∀ {A B : Set} {a b : A} → (f : A → B) → a ≡ b → f a ≡ f b
cong f (refl x) = refl (f x)

associative : ∀ (n m o : ℕ) →  n + (m + o) ≡ (n + m) + o
associative zero m o = refl (m + o)
associative (suc x) m o = cong suc (associative x m o)

This proof makes reference to a second kind of equality, a propositional equality _≡_ which is "inside" the proof system.  We have simply represented it as a data type.   By showing two things are equal in this world, we've shown that there is a way to decompose arguments until we have a judgemental equality.

However, if we had some way to make it a judgemental equality, then we would hardly have had to have two cases!  Instead we could immediately just give the proof refl, since they would both reduce to the same thing.  But is this type of reduction possible?   Certainly not in the general case.  However, there is a large lurking universe in which it could be done.

Enter supercompilation!  The sorts of reductions that supercompilers give you will in fact reduce the associative addition statement to identical proofs.  This means that the normal form will be identical on both sides, and we have a judgemental equality.

But what is going on?  First, I think we need to think about cyclic proofs for a minute.   When we use the function _+_, we are actually introducing a cyclic proof.   We don't know how big the  proof is going to be until we get the data.  Instead we have a stand in cycle that represents the way that we will unfold the proof as more data comes in.

The reason we were blocked before is because recursive functions are uncooperative.  What they actually mean to say is that they are going to unfold themselves again to produce new proof leaves as long as you keep feeding them data.   The unfolding leads to new possible cuts to eliminate; more reductions using our rewriting rules.  However they will never cooperate with their neighbour recursive function in trying to produce a new cycle which produces the same infinitely unfolded proof.  They essentially limit the kinds of cycles that can exist in the proof tree to the ones that existed when you wrote down the definitions.

A Supercompiler's fundamental trick is in allowing us to abstract away the function names and just look at the proof as it unfolds creating cycles as we see fit (subject to similar rules that total functions must abide you can't have cycles unless something is always decreasing).  This means that we can create a cycle in the proof, (and perhaps a new named function) which has the same infinite unfolding of the proof tree.

This expansion of judgemental equality can make more proofs easier.  However, it can also make more proofs possible - since it is sometimes the case that we can take a program whose cycles look dubious, and turn them into cycles which look perfectly fine, since they now respect some simple decreasing or increasing order.  This was the subject of my dissertation, but unfortunately I don't think I knew enough type theory at the time to make the story convincing.

There are a lot of ideas that follow on from this.  The ideas of homotopy type theory think a lot about what kinds of proofs can identify two elements in propositional equality.  With a broader notion of judgemental equality, its interesting to wonder how the structure of identifications would change...

Wednesday, 8 May 2013

Monads and Asynchronous Ajax with Javascript

Monads are often seen as a very basic part of the language for haskellers and something completely obscure for programmers which use other, especially non-functional, languages. Even lispers and schemers tend not to use them consciously to any great extent, even though they can help with a number of common problems that occur - perhaps most obviously with functions returning nil in a chain of compositions.

Monads, or Kleisli triples have a formal mathematical meaning which requires satisfaction of a few basic axioms. However, we can understand them fairly simply as a pattern which involves a bind operator which allows us to compose our monad objects, and a unit operator which allows us to lift a value up into one of monad objects.

MonadPlus extends this pattern and allows us a new type of composition +, and a new unit, called mzero that is somewhat like "plus" in the sense that + doesn't do anything when composed mzero.

MonadPlus turns out to be an excellent setting for conveniently dealing with concurrency. So it was that after playing with ajax for a while I found myself irritated with the way in which I had to wrap up continuations in order to get the sequencing behaviour I wanted. It occurred to me that what I really wanted was a MonadPlus.

I started writing one myself and realised that since my knowledge of javascript arcana was somewhat thin (one needs not simply implement them - they also have to be convenient to use!), I'd be better off finding out if someone else had already done it, and hence came to Douglas Crockford's video of javascript monads.

In this video Crockford demonstrates how to roll up a monadPlus object which he calls a vow. In his framework, he has a single composition function when which acts as a combination of bind and +. In order to lift a value into the monad, we use vow.keep(data). Since he doesn't use a strongly typed language, he's able to be a bit more flexible in what actually gets put in his when. In the example functions below, I always return a promise, which can be used by the next function in the chain - this is the standard way that bind works, composing functions (f : a → M b). However, the plumbing written by Crockfort will also automatically lift a function (f: a → b) into the monad, in the event that you have an immediate value and not a promise. Similarly, instead of composition with alternatives, we have our failure function in the right hand side.

This conveniently expresses a combination of "and/or" type asynchronous operators in a single convenient framework. We don't need to worry about when the asynchronous calls return - we are guaranteed that if they do, the next positive function in the chain will be called. If they don't return, we will fall over to our first error handler.

I wrote a quick application that demonstrates how you might use Crockfort's vows:

var iphost = "";
  var countryhost = "";
  function getIP(){ 
      var vow = VOW.make(); 

      var ajs = { 
          type : 'GET', 
          url : iphost, 
          dataType: 'json', 
          success : function(data){ 
          error: function(x,e){ 
      return vow.promise;
  function countryOf(ip){ 
      var vow = VOW.make(); 

      var ajs = { 
          type : 'GET', 
          url : countryhost + ip, 
          dataType: 'json', 
          success : function(data){ 
          error: function(x,e){ 
      return vow.promise;

  function display(data){ console.log(data);}
  function showError(data){console.log("Error "+data);}

  $(document).ready(function (){
          .when(display, showError)

UPDATE: Much to my surprise, all of this behaviour is already implemented with deferred ajax calls, using "then". It even manages to lift functions which don't return a promise into a promise automatically.

Friday, 19 October 2012

Formalisation of Tables in a Dependent Language

I've had an idea kicking about in my head for a while of making query plans explicit in SQL in such a way that one can be assured that the query plan corresponds to the SQL statement desired. The idea is something like a Curry-Howard in a relational setting. One could infer the plan from the SQL, the SQL from the plan, or do a sort of "type-checking" to make sure that the plan corresponds to the SQL.

The devil is always in the details however. When I started looking at the primitives that I would need, it turns out that the low level table joining operations are actually not that far from primitive SQL statement themselves. I decided to go ahead and formalise some of what would be necessary in Agda in order get a better feel for the types of objects I would need and the laws which would be required to demonstrate that a plan corresponded with a statement.

Dependent types are very powerful and give you plenty of rope to hang yourself. It's always something of a struggle for me to figure out what I want to live in the type world as opposed to what I want to live in the term world.

First things first. We're going to need rows if we want tables. In dynamically typed languages (a.k.a. languages with a [potentially open] universal type) you can easily make a heterogeneous collection of objects (numbers, strings etc). Modelling a row in lisp or python is very straight forward. However, in a strongly typed setting heterogeneous lists are more complicated. The solution I have written in Agda below uses a vector (really just a list with a guaranteed size) of types which lives in the type universe, and which is constructed as we push things into our row. This is basically just a heterogeneous list where the type advertises what kind of row it is.

module RowList where

open import Level renaming (suc to lsuc; zero to lzero)
open import Data.Nat hiding (__)
open import Data.Vec renaming (__ to __; _++_ to _++_; [] to []; [_] to [_])
open import Data.List 
open import Data.Fin renaming (_+_ to _+f_)
open import Data.Nat.Properties
open import Relation.Binary.PropositionalEquality as PropEq
  using (__; refl; sym; subst; trans; cong; cong₂; inspect)
open import Data.Bool

infixr 4 _,_
infixr 5 __

data Row {α : Level} : (n : )  Vec (Set α) n  Set (lsuc α) where 
  · : Row zero []
  _,_ :  {j} {A : Set α} {Aⱼ : Vec (Set α) j}  A  Row j Aⱼ -> Row (suc j) (A  Aⱼ)

In the data definition above we use (·) to terminate a row, and return a row which has an empty-vector in the rows type signature. In the
(,) case, we extend the row by giving an element of some type A and stick its type in our row signature vector. This will allow us to ensure that tables are constructed of homogeneous row types, and will allows us to present a lookup function for rows which knows what its return type will be.

You may notice all the bits about (Set α) etc. The α is just book keeping to avoid Russell-type paradoxes creeping in. They are otherwise not particularly enlightening and the above table definition might be more easily read if you ignore them.

example : Row 2 (  (  []))
example = (1 , 2 , ·)

pick :  {α i j A}  (v : Vec (Fin j) i)  Vec {α} A j  Vec A i
pick [] ys = []
pick (x  xs) ys = lookup x ys  pick xs ys 

__ :  {α j Aⱼ}  (r : Row {α} j Aⱼ)  (i : Fin j)  lookup i Aⱼ 
·  ()
(y , ys)  zero = y
(y , ys)  suc i = ys  i 

We call our row selection operation (r i) for a row (r) and index i. Notice that the return type is dependent on the type signature vector which is carried around in the type of the row. This selector will be useful for writing predicates over columns which will have to know what their input type is. We use a finite type (Fin j) to describe our index, since we need to make sure we don't index outside of the row. The finite type ensures us that whatever the index is, it's no greater than the size of the vector. These operations are now guaranteed to be type safe and can not lead to run-time out-of-range errors.

_++r_ :  {α i j Aᵢ Bⱼ}  Row {α} i Aᵢ  Row j Bⱼ  Row (i + j) (Aᵢ ++ Bⱼ)
· ++r r = r
(x , xs) ++r r = x , (xs ++r r)

We of course need to be able to combine rows from different tables in joins, and so we'll need a row concatenation operator, which we call (++r).

pickr :  {α i j Aⱼ}  (v : Vec (Fin j) i)  Row {α} j Aⱼ  Row i (pick v Aⱼ)
pickr [] r = · 
pickr (x  xs) r = r  x , pickr xs r

Additionally, it's useful if we can specify a list of indices we want to pick out of a row to construct a new row. This will allow us to make a "select" like statement where we can duplicate or ignore indices in order to create a new row, which when done over a list of rows will become more like the familiar table select.

Table :  {α} (m : ) (Aⱼ : Vec (Set α) m)  Set (lsuc α)
Table {α} m Aⱼ = List (Row m Aⱼ)

We define tables as simply a list of rows, and keep the indices and types as parameters to the type. The table header is now correct by construction and exists only as a type index.

Example table: The header is specified as the type
exampleTable : Table 3 (  (  (  [])))
exampleTable = (1 , 2 , 3 , ·)
             ((3 , 4 , 5 , ·)
             ((5 , 6 , 7 , ·)

__ :  {α i j Aᵢ Bⱼ}  Table {α} i Aᵢ  Table j Bⱼ  Table (i + j) (Aᵢ ++ Bⱼ)
[]  [] = []
[]  _  = []
_   [] = []
(x  xs)  (x'  xs') = x ++r x'  [ x ]  xs' ++ xs  (x'  xs') 
-- concatenate row, union row with the rest of the table

The familiar join operation. As you can see here, our primitive join operation does not take a restriction. In practice the primitive join would have to use the restriction in order to reduce the amount of row consing. In order to have a pragmatic optimiser, I'll have to change this later.

Example join
result : Table 6 (  (  (  (  (  (  []))))))
result = exampleTable  exampleTable

-- Restriction 
__ :  {α j Aⱼ}  Table {α} j Aⱼ   (Row j Aⱼ  Bool)  Table j Aⱼ
[]  p = []
(x  xs)  p = if p x then x  xs  p else xs  p 

Restrictions are just row predicates. They simply take a row and specify which rows to keep and which to throw away. This allows for most of the usual SQL WHERE clause shenanigans (minus aggregation of course) and gives us a fairly complete method of doing joins subject to restrictions [we don't deal with nulls at all].

Example restriction
_b_ :  (n m : )  Bool
zero b m = true
suc n b zero = false
suc n b suc n' = n b n'

restrict : Table 6 (  (  (  (  (  (  []))))))
restrict = result  (λ v  (v  (# 1)) b 1)
Finally we have restriction, which allows us to pick and choose columns by specifying their index. It would of course be more natural to have names for columns and allow that as a means of addressing the columns of interest and I may in fact add that to rows in order to simplify the debugging, but it's not of much interest logically speaking.
-- Selection 
__ :  {α i j Aⱼ}  (v : Vec (Fin j) i)  Table {α} j Aⱼ  Table i (pick v Aⱼ)
xs  [] = []
xs  (y  ys) = (pickr xs y)  xs  ys

Example selection 
-- Syntax of a query: (selector ∥ t1 ∪ t2 ∪ t3 ▹ restriction)

selection : Table 2 (  (  []))
selection = ((# 1)  ((# 3)  []))  result  (λ v  (v  (# 1)) b 4)

From this point we will need some form of relation which allows us to establish query equivalence. We will then want a way to compute whether two tables are equivalent up to this relation and we can continue on with the more pragmatic concerns of pushing the restrictions into the actual table joins.

Friday, 2 March 2012

Decidable Equality in Agda

So I've been playing with typing various things in System-F which previously I had left with auxiliary well-formedness conditions. This includes substitutions and contexts, both of which are interesting to have well typed versions of. Since I've been learning Agda, it seemed sensible to carry out this work in that language, as there is nothing like a problem to help you learn a language.

In the course of proving properties, I ran into the age old problem of showing that equivalence is decidable between two objects. In this particular case, I need to be able to show the decidability of equality over types in System F in order to have formation rules for variable contexts. We'd like a context Γ to have (x:A) only if (x:B) does not occur in Γ when (A ≠ B). For us to have statements about whether two types are equal or not, we're going to need to be able to decide if that's true using a terminating procedure.

And so we arrive at our story. In Coq, equality is something of a nightmare. In Agda, by contrast, often times this nightmare is avoided with clever structuring of the rules for elimination in pattern matching. However, Agda's lack of a system of tactics brings the nightmare home to roost in a different nest. This proof was 1 line in Coq, and amounts to evocation of the tactic decide equality. decide equality..

(Thanks to Saizan for showing me how to do this: Eq.agda)

First of course, we need to set up our modules etc...

module SystemFPlus where 

open import Function
open import Data.Fin using (Fin)
open import Data.Nat hiding (_*_) 
open import Data.Bool renaming (__ to __Bool) 
open import Data.List 
open import Data.Unit using ()
open import Relation.Nullary.Core
open import Data.Sum 
open import Data.Product 
--open import Relation.Binary.EqReasoning
open import Relation.Binary hiding (__)
open import Relation.Binary.PropositionalEquality as PropEq
  using (__; refl)

Next we write down an inductive definition for System-F style types.

data Ty : Set where
   : Ty                -- unit type
  ι :   Ty           -- universal abstraction vars
  η :   Ty           -- μ vars 
  __ : Ty  Ty  Ty  -- implication
  __ : Ty  Ty  Ty  -- conjunction
  __ : Ty  Ty  Ty  -- disjunction
  μ : Ty  Ty          -- inductive types
  Π : Ty  Ty          -- universal quantification

Next we have to do the business which is relatively usual in Agda, when proving negative goals under a constructor, that is, define explicit inversion principles. It turns out that you don't generally have to do this if you're not under a constructor and inside a λ but I haven't found a way to avoid it when you are, and you almost always are when you're proving false for something decidable.

invι :  {n m}  (ι n  ι m)  n  m
invι refl = refl 

invη :  {n m}  (η n  η m)  n  m
invη refl = refl 

invμ :  {α β}  (μ α  μ β)  α  β
invμ refl = refl 

invΠ :  {α β}  (Π α  Π β)  α  β
invΠ refl = refl 

invL :  {α β δ γ}  (α  β  δ  γ)  α  δ
invL {.α} {β} {α} {.β} refl = refl

invR :  {α β δ γ}  (α  β  δ  γ)  β  γ
invR {.α} {β} {α} {.β} refl = refl

invL :  {α β δ γ}  (α  β  δ  γ)  α  δ
invL {.α} {β} {α} {.β} refl = refl

invR :  {α β δ γ}  (α  β  δ  γ)  β  γ
invR {.α} {β} {α} {.β} refl = refl

invL :  {α β δ γ}  (α  β  δ  γ)  α  δ
invL {.α} {β} {α} {.β} refl = refl

invR :  {α β δ γ}  (α  β  δ  γ)  β  γ
invR {.α} {β} {α} {.β} refl = refl

Now comes the epic annoyance. This function could have been written by a program. It's absolutely vanilla, and yet I'm forced to write out the full cross-product of possibilities almost all of which lead to the conclusion: (λ ()), which is essentially, "no, and you should know why". Since it knows why, it seems unnecessarily arduous for me to have to write out every case!

__ty : Decidable {A = Ty} __
         ty      = yes refl
        (ι m) ty   = no (λ ())
        (η m) ty   = no (λ ())
        (δ  γ) ty = no (λ ())  
        (δ  γ) ty = no (λ ()) 
        (δ  γ) ty = no (λ ()) 
        (μ β) ty   = no (λ ()) 
        (Π β) ty   = no (λ ()) 
(ι n)     ty       = no (λ ())
(ι n)    (ι m) ty   with n  m 
(ι .n)   (ι n) ty   | yes refl = yes refl
(ι n)    (ι m) ty   | no p  = no (p  invι)
(ι n)    (η m) ty   = no (λ ())
(ι n)    (δ  γ) ty = no (λ ())  
(ι n)    (δ  γ) ty = no (λ ()) 
(ι n)    (δ  γ) ty = no (λ ()) 
(ι n)    (μ β) ty   = no (λ ()) 
(ι n)    (Π β) ty   = no (λ ()) 
(η n)     ty       = no (λ ())
(η n)    (ι m) ty   = no (λ ())
(η n)    (η m) ty   with n  m 
(η .n)   (η n) ty   | yes refl = yes refl
(η n)    (η m) ty   | no p  = no (p  invη)
(η n)    (δ  γ) ty = no (λ ())  
(η n)    (δ  γ) ty = no (λ ()) 
(η n)    (δ  γ) ty = no (λ ()) 
(η n)    (μ β) ty   = no (λ ()) 
(η n)    (Π β) ty   = no (λ ()) 
(μ α)     ty       = no (λ ())
(μ α)    (ι m) ty   = no (λ ())
(μ α)    (η m) ty   = no (λ ())
(μ α)    (δ  γ) ty = no (λ ())  
(μ α)    (δ  γ) ty = no (λ ()) 
(μ α)    (δ  γ) ty = no (λ ()) 
(μ α)    (μ β) ty   with α  β ty 
(μ .β)   (μ β) ty   | yes refl = yes refl 
(μ α)    (μ β) ty   | no p  = no (p  invμ)
(μ α)    (Π β) ty   = no (λ ())
(Π α)     ty       = no (λ ())
(Π α)    (ι m) ty   = no (λ ()) 
(Π α)    (η m) ty   = no (λ ())
(Π α)    (δ  γ) ty = no (λ ())
(Π α)    (δ  γ) ty = no (λ ())
(Π α)    (δ  γ) ty = no (λ ()) 
(Π α)    (Π β) ty   with α  β ty 
(Π .α)   (Π α) ty   | yes refl = yes refl 
(Π α)    (Π β) ty   | no p  = no (p  invΠ)
(Π α)    (μ β) ty   = no (λ ()) 
(α  β)   ty      = no (λ ())
(α  β)  (ι m) ty   = no (λ ())
(α  β)  (η m) ty   = no (λ ())
(α  β)  (δ  γ) ty   with α  δ ty  
(.α  β)  (α  γ) ty  | yes refl with β  γ ty 
(.α  .β)  (α  β) ty | yes refl | yes refl = yes refl
(.α  δ)  (α  γ) ty  | yes refl | no p = no (p  invR)
(α  β)  (δ  γ) ty   | no p = no (p  invL)
(α  β)  (δ  γ) ty = no (λ ()) 
(α  β)  (δ  γ) ty = no (λ ()) 
(α  β)  (μ δ) ty   = no (λ ()) 
(α  β)  (Π δ) ty   = no (λ ()) 
(α  β)   ty      = no (λ ())
(α  β)  (ι m) ty   = no (λ ())
(α  β)  (η m) ty   = no (λ ())
(α  β)  (δ  γ) ty   with α  δ ty  
(.α  β)  (α  γ) ty  | yes refl with β  γ ty 
(.α  .β)  (α  β) ty | yes refl | yes refl = yes refl
(.α  δ)  (α  γ) ty  | yes refl | no p = no (p  invR)
(α  β)  (δ  γ) ty   | no p = no (p  invL)
(α  β)  (δ  γ) ty = no (λ ()) 
(α  β)  (δ  γ) ty = no (λ ()) 
(α  β)  (μ δ) ty   = no (λ ()) 
(α  β)  (Π δ) ty   = no (λ ()) 
(α  β)   ty      = no (λ ())
(α  β)  (ι m) ty   = no (λ ())
(α  β)  (η m) ty   = no (λ ())
(α  β)  (δ  γ) ty   with α  δ ty  
(.α  β)  (α  γ) ty  | yes refl with β  γ ty 
(.α  .β)  (α  β) ty | yes refl | yes refl = yes refl
(.α  δ)  (α  γ) ty  | yes refl | no p = no (p  invR)
(α  β)  (δ  γ) ty   | no p = no (p  invL)
(α  β)  (δ  γ) ty = no (λ ()) 
(α  β)  (δ  γ) ty = no (λ ()) 
(α  β)  (μ δ) ty   = no (λ ()) 
(α  β)  (Π δ) ty   = no (λ ()) 

Well there has to be a better way right? So I thought I'd try to cook one up. It turns out that you can solve a decidable equality problem by injection into a domain with a decidable equality. So I went about thinking about what kind of decidable equality might have a domain with a similar enough shape that we could do the incoding without to much difficulty. I came up with the following program. It hardly saves any time at all unfortunately, but it does make me wonder if there isn't a more efficient way of doing it, perhaps using vectors, modules with decidable equalities for the carried data and with a single data constructor. Perhaps containers are the solution? I'm not sure. Anyhow, if you have a big inductive data type (such as a rich programming term language) this approach might save you trouble.

The trick is essentially to create an isomorphism with something like a Gödel coding. Since numbers are a hassle, and we need to have an isomorphism, it's more natural to do it into a space of trees where the trees have the same basic shape. We encode each operator as a number up to a finite size, whose limited nature is dictated by Fin n. This ensures that we don't have to worry about what to do with numbers bigger than the number of operators we have. The rest should be fairly self explanatory. Hopefully this gives someone ideas of a better way...

module TyDec where 

open import Function using (__)
open import Data.Nat hiding (_*_) 
open import Data.Fin renaming (zero to fzero ; suc to fsuc)
open import Data.Fin.Props renaming (__ to __fin)
open import Data.Product using (Σ; _,_; proj₁; proj₂)
open import Relation.Nullary.Core
open import Relation.Binary hiding (__)
open import Relation.Binary.PropositionalEquality as PropEq
     using (__; refl; trans; sym; cong)

data Ty : Set where
   : Ty
  ι :   Ty
  η :   Ty
  μ : Ty  Ty
  Π : Ty  Ty
  __ : Ty  Ty  Ty
  __ : Ty  Ty  Ty 
  __ : Ty  Ty  Ty

data Tree : Set where 
  Nullary : Tree
  NullaryNat : Fin 2    Tree
  Unary : Fin 2  Tree  Tree
  Binary : Fin 3  Tree  Tree  Tree

invNullaryNat1 :  {op1 op2 n1 n2}  NullaryNat op1 n1  NullaryNat op2 n2  op1  op2
invNullaryNat1 refl = refl

invNullaryNat2 :  {op1 op2 n1 n2}  NullaryNat op1 n1  NullaryNat op2 n2  n1  n2
invNullaryNat2 refl = refl

invUnary1 :  {op1 op2 t1 t2}  Unary op1 t1  Unary op2 t2  op1  op2
invUnary1 refl = refl

invUnary2 :  {op1 op2 t1 t2}  Unary op1 t1  Unary op2 t2  t1  t2
invUnary2 refl = refl

invBinary1 :  {op1 op2 l1 l2 r1 r2}  Binary op1 l1 r1  Binary op2 l2 r2  op1  op2
invBinary1 refl = refl

invBinary2 :  {op1 op2 l1 l2 r1 r2}  Binary op1 l1 r1  Binary op2 l2 r2  l1  l2
invBinary2 refl = refl

invBinary3 :  {op1 op2 l1 l2 r1 r2}  Binary op1 l1 r1  Binary op2 l2 r2  r1  r2
invBinary3 refl = refl

__tree : Decidable {A = Tree} __
Nullary                Nullary               tree = yes refl  
Nullary                (NullaryNat op n)     tree = no (λ())
Nullary                (Unary op t)          tree = no (λ())
Nullary                (Binary op l r)       tree = no (λ())
(NullaryNat op n)      Nullary               tree = no (λ ())
(NullaryNat op1 n1)    (NullaryNat op2 n2)   tree with op1  op2 fin
(NullaryNat .op1 n1)   (NullaryNat op1 n2)   tree | yes refl with n1  n2
(NullaryNat .op1 .n1)  (NullaryNat op1 n1)   tree | yes refl | yes refl = yes refl
(NullaryNat .op1 n1)   (NullaryNat op1 n2)   tree | yes refl | no p = no (p  invNullaryNat2)
(NullaryNat op1 n1)    (NullaryNat op2 n2)   tree | no p  = no (p  invNullaryNat1) 
(NullaryNat op1 n1)    (Unary op t)          tree = no (λ())
(NullaryNat op1 n1)    (Binary op l r)       tree = no (λ())
(Unary op1 t1)         Nullary               tree = no (λ())
(Unary op1 t1)         (NullaryNat op n)     tree = no (λ())
(Unary op1 t1)         (Unary op2 t2)        tree with op1  op2 fin
(Unary .op1 t1)        (Unary op1 t2)        tree | yes refl with t1  t2 tree
(Unary .op1 .t1)       (Unary op1 t1)        tree | yes refl | yes refl = yes refl
(Unary .op1 t1)        (Unary op1 t2)        tree | yes refl | no p = no (p  invUnary2) 
(Unary op1 t1)         (Unary op2 t2)        tree | no p = no (p  invUnary1) 
(Unary op1 t1)         (Binary op2 l r)      tree = no (λ())
(Binary op l r)        Nullary               tree = no (λ())
(Binary op1 l r)       (NullaryNat op2 n)    tree = no (λ())
(Binary op1 l r)       (Unary op2 t)         tree = no (λ())
(Binary op1 l1 r1)     (Binary op2 l2 r2)    tree with op1  op2 fin
(Binary .op1 l1 r1)    (Binary op1 l2 r2)    tree | yes refl with l1  l2 tree
(Binary .op1 .l1 r1)   (Binary op1 l1 r2)    tree | yes refl | yes refl with r1  r2 tree
(Binary .op1 .l1 .r1)  (Binary op1 l1 r1)    tree | yes refl | yes refl | yes refl = yes refl
(Binary .op1 .l1 r1)   (Binary op1 l1 r2)    tree | yes refl | yes refl | no p = no (p  invBinary3)
(Binary .op1 l1 r1)    (Binary op1 l2 r2)    tree | yes refl | no p = no (p  invBinary2)
(Binary op1 l1 r1)     (Binary op2 l2 r2)    tree | no p = no (p  invBinary1)

Op codes 

ι = 0
η = 1

μ = 0 
Π = 1

⇒ = 0
⊗ = 1
⊕ = 2

jni : Tree  Ty 
jni Nullary                              =  
jni (NullaryNat fzero n)                 = ι n
jni (NullaryNat (fsuc fzero) n)          = η n
jni (NullaryNat (fsuc (fsuc ())) n)   
jni (Unary fzero t)                      = μ (jni t)
jni (Unary (fsuc fzero) t)               = Π (jni t)
jni (Unary (fsuc (fsuc ())) t)
jni (Binary fzero l r)                   = (jni l)  (jni r)
jni (Binary (fsuc fzero) l r)            = (jni l)  (jni r)
jni (Binary (fsuc (fsuc fzero)) l r)     = (jni l)  (jni r)
jni (Binary (fsuc (fsuc (fsuc ()))) l r) 

injEx : (x : Ty) -> Σ Tree \ t -> jni t  x
injEx                          = Nullary , refl 
injEx (ι n)                     = NullaryNat fzero n , refl
injEx (η n)                     = NullaryNat (fsuc fzero) n , refl
injEx (μ t)                     with injEx t
injEx (μ .(jni tree))           | tree , refl = (Unary fzero tree) , refl
injEx (Π t)                     with injEx t
injEx (Π .(jni tree))           | tree , refl = (Unary (fsuc fzero) tree) , refl
injEx (t  s)                   with injEx t
injEx (.(jni tr1)  s)          | tr1 , refl with injEx s
injEx (.(jni tr1)  .(jni tr2)) | tr1 , refl | tr2 , refl = (Binary fzero tr1 tr2) , refl 
injEx (t  s)                   with injEx t
injEx (.(jni tr1)  s)          | tr1 , refl with injEx s
injEx (.(jni tr1)  .(jni tr2)) | tr1 , refl | tr2 , refl = (Binary (fsuc fzero) tr1 tr2) , refl 
injEx (t  s)                   with injEx t
injEx (.(jni tr1)  s)          | tr1 , refl with injEx s
injEx (.(jni tr1)  .(jni tr2)) | tr1 , refl | tr2 , refl = (Binary (fsuc (fsuc fzero)) tr1 tr2) , refl 

inj : Ty  Tree
inj = proj₁  injEx -- strip off the existential proof

injeq :  {x y} -> inj x  inj y -> x  y
injeq {x} {y} p = trans (sym (proj₂ (injEx _))) (trans (cong jni p) (proj₂ (injEx _)))

__ty : (x y : Ty) -> Dec (x  y)
x  y ty with (inj x)  (inj y) tree
x  y ty | yes p = yes (injeq p)
x  y ty | no p = no (p  (cong inj))