Sunday, 26 November 2006

Call by Name (CBN) is dual to Call By Value (CBV)

Probably one of the best papers I've read on the relationship between CBN, CBV and the Curry-Howard correspondance is the paper Call-by-value is dual to call-by-name by Wadler. The calculus he develops for describing the relationship shows an obvious schematic duality that is very visually appealing. After reading the paper that I mentioned earlier on Socially Responsive, Environmentally Friendly Logic (which shall henceforth be called SREF Logic), it struck me that it would be interesting to see what a CPS (Continuation-passing Style) like construction looks like in SREF logic, so I went back to the Wadler paper to see if I could figure out how to mimic the technique for multi-player logic. It looks like the formulation by Wadler comes out directly by thinking about logic as a two player game! I'm excited to see what happens with n-player logic. This has been a big diversion from what I'm actually suppose to be working on but I didn't want to forget about it :).

Saturday, 25 November 2006

Internet to the rescue

Once again the internet comes to the rescue with a Systematic Search for Lambda Expressions.  This is the answer to yesterdays question of whether we can iterate over isomorphic proofs exhaustively in order to extract all programs of a specification which in this case is realised as a "type".  Hooray for computer science!

Friday, 24 November 2006

Brain Dump

I have a bunch of ideas that I don't have time to follow up on but I'd like to retain in some fashion so here goes.

1.

A long time ago I had a question about the shortest assembly program (parameterised by the instruction set of course!) that could encode the generators of the quaternions under multiplication. It is a very easy thing to program this, but as I was doing it, I found myself being highly doubtful that I was choosing the "right" encoding. In fact I almost certainly wasn't. Would there not be some very tight encoding in terms of "normal" assembly language operators? This has been a persistent question in my mind that comes up frequently in different and often more general forms. If you want to make a fast compiler, how do you know what instructions to output? If you have a highlevel specification, how can you avoid the ridiculously high overhead usually associated with extremely high level languages?

Today I found this fabulous paper (from 1987!) that deals with some of the basic issues involved in finding the shortest program for a given problem. It's called "Superoptimizer -- A look at the smallest program". I'd link to it, but I got it off ACM so if you are one of the few people that has a subscription you'll know where to find it and otherwise I don't want to give the money grubbers the link-love.

Some other thoughts that are in this region of my thought-space. Can you take a given specification with a proof and systematically transform it to enumerate the space of satisfying programs? If not, why not? Even if you can't, might it not be interesting to just perform large searches of this space? Are there methods of using the transformation operators such that programs are only decreasing or minimally increasing? If so then perhaps we can call the search good at some size threshold since very large programs are unlikely to be good because of locality issues. Also Automatic Design of Algorithms Through Evolution is in a very similar vein.

2.

Concurrency is a nasty problem. It doesn't have a nice formalism that everyone in the CS world can agree on. There must be like 500 different formalisms. All of them better (easier, not neccessarily faster) than the ones that we actually use (locking, threads, condition variables) but none of them stand out as "the right thing".

I recently found a paper called:
Socially Responsive and Environmentally Friendly Logic
. I love the title :) But aside from that, the formalism is very nice. It is something that I've contemplated a bit but never had the drive to actually try to work out formally. The basic idea comes from the knowledge that Classical Logic and Intuisionistic Logic can be viewed as 2 player games. This game is pretty simple. If I have a proof phi then to win I have to prove it. If I have a proof ¬φ, then to win my oponent has to fail to prove it. If I have φ ∧ ψ then my partern gets to pick a formula and I have to prove it. If I have ∀x then my oponent gets to pick any stand-in for x that he would like. You can probably guess the rest (or look it up). This alternate logic breaks the essential two person nature of the logic. One interesting practical feature of negation in the traditional logics, is that they give rise to Continuations in the Curry-Howard Correspondance. So what does this give rise to in the N-player games defined by Abramsky? I'm not sure, but I suspect it might give process migration! Something worth looking in to.

Sunday, 19 November 2006

On partial evaluation (and other meta-compilers)

So after a bit of research it turns out that the big reason that we don't have a meta-compiler compiler generator project because sophisticated partial evaluators are pretty much never self applicable.  As far as I can tell from the literature the Futamura projections in the context of logic programming have only ever been applied in practice on "off-line" partial evaluators.  Off-line partial evaluators tend to be much less sophisticated since they do most of their reasoning "off-line", that is, without attempting to unfold.

Apparently part of the reason for this is the use of extra-logical features in the sophisticated partial evaluators, in order to make them fast enough that they can reasonably be applied.  It is hard to make performant prolog without using cut.  Once cut is used however, all bets are off, since it is nearly impossible to reason about effectively.

I started writing my meta-compiler without using cut, but restricting myself to soft-cut, because of the purely declarative reading that can be given to soft-cut.  If I'm careful perhaps this will allow me to make my meta-compiler self-applicable.  Since I don't really understand all of the details of why on-line partial evaluators are not generally self applicable, we'll have to see.

Sunday, 12 November 2006

Plotkin, the LGG and the MGU

Legend has it that a million years ago Plotkin was talking to his professor Popplestone, who said that unification (finding the most general unifier or the MGU) might have an interesting dual, and that Plotkin should find it. It turns out that the dual *is* interesting and it is known as the Least General Generalisation (LGG). Plotkin apparently described both the LGG for terms, and for clauses. I say apparently because I can't find his paper on-line.

The LGG for clauses is more complicated so we'll get back to it after we look at the LGG of terms. We can see how the MGU is related to the LGG by looking at a couple of examples and the above image. We use the prolog convention that function symbols start with lower case, and variables start with uppercase. The image above is organised as a DAG (Directed Acyclic Graph). DAGs are a very important structure in mathematics since DAGs are lattices.

Essentially what we have done is drawn an (incomplete) Hasse diagram for the lattice defined. Each of the arrows in our diagram represents a possible substitution. For instance 'p(X)' can have X substituted for either 's' or 'g(r,Y)'. Anything that can be reached from a series of downward traversals is "unifiable" and the substitution of the unification is the aggregate composition of substitutions we encountered on the path. So for instance 'p(X)' will unify with 'p(g(r,s))' under the substitution X=g(r,Y) and Y=s. Any two terms which are not connected by a path are not unifiable.

The least general generalisation is also apparent in our picture. Any two terms will have a common parent in the Hasse diagram. The least, (in the sense of distance from the top) parent of any two terms in our diagram is the LGG. So actually the connection between the two is fairly straightforward (using 20/20 hindsight).

More Questions than Answers

Code transformation or meta-compilation as it is sometimes called (which is the general notion of techniques including Partial Evaluation, Supercompilation, Deforestation, or my advisors Distillation), is a powerful technique in computer programming. The benefits (and drawbacks) are almost certainly not sufficiently studied.

I was just conversing with my room-mate Tom about meta-compilation and I made the supposition that Meta-compilers are somewhat like the technology of the lathe. There are a huge number of technologies that require a lathe in order to be produced efficiently. A lathe can be viewed as a major nexus in the dependency graph of machining technology. A lathe is an almost a completely fixed precondition for the mill. The Mill is the crux of modern machining. It allows you to construct almost any currently available machined part. Without the mill we really wouldn't have the industrial age at all. Do such things exist in computer programming?

Metacompiler technology is incredibly powerful. It is a technique that usually is concidered to be a superset of a partial-evaluator. It is a compiler technique that starts in the source language and ends in the source language rather than some target language as does a standard compiler. While this might at first sound trivial or irrelevant a few examples can convince one that it actually a very useful tool. (2*2) can be coded in most languages, but really it is just the literal 4. Partial-evaluation will reduce this computation at compile-time elminating the cost from the final executable. The power doesn't stop there though. One particularly convincing example that I found was the partial evaluation of fairly simple grammar recogniser (parser) which reduced a problem directly from an NDFA to a DFA. Which is basically the compilation process used for regexps.

The Futamura projections give us some idea of just how powerful the technique is. If we have a metacompiler, we can metacompile an intepreter with respect to a program writen in the source language of the interpreter to arive at an executable in the language of the metacompiler. In fact, if we metacompile the metacompiler with respect to the interpreter and we can generate a compiler!

So I have a *lot* of questions about metacompilation. It sounds almost too good to be true (but there are good reasons to believe that it isn't). Some of them are very technical which I will probably save for tomorrow's post. The following question though is more philosophical, and practical (can those two happen at the same time?)

Why aren't supercompilers/partial evaluators used as general compilation systems? If you can write a supercompiler in some high level, nice language like OCaml and then all you have to do is write an interpreter for your language of choice in order to produce a compiler, then why isn't this done?

This seems like the holy grail of leveraging, or code re-use. You could write one really good compiler for a good language for specifying languages (Which ML was originally designed for, and of which OCaml is a descendant). One really good metacompiler. At this point every other language (front end, in the terminology of GCC) is simply the act of writing an interpreter. Writing an interpreter is *radically* simpler than making a sophisticated compiler. It is basically equivalent to a specification for the language. The process of language design can hardly be facilitated more than this since interpreters are pretty much the minimal requirement for specifying the operational semantics of a language!

My question is why isn't this general procedure really carried out in practice? Are metacompilers not good enough in practice to produce high quality performant programs? Has it just not been tried? If not, I'd like to see some effort expended on this, since it seems like a crucial technology that could really be leveraged far more than any of the "shared VM" projects like C# with minimal cost to language implementors.

Sunday, 16 July 2006

Strategies for automatic translation

I've been hacking away at my program to test a theory I have about machine translation. I wrote a bit about it in a previous post but I was fairly vague. I thought I'd describe in more detail exactly how the technique would work (I'm still in phase 1).

The idea is simple. The first phase is to take a corpus in a language. Take each sentence of the source (or some other sized chunk, currently I'm limited by computational tractability to a single sentence) and recombine each element of the sentence into every possible string of n-grams. If you play with it a bit you'll realise that there are 2(N-1) of these for a string of size N. One way to think about it is that there are N-1 indexes into the spaces between words in the string. You can then think of each sentence as being a collection of indexes at which we combine words. This is obviously the power set of the set of indexes {1,2,3...N-1} and hence there are 2(N-1). It turns out however that it is nice to have a special word meaning "beggining of sentence" and another for "end of sentence", so we end up starting with N+2 words, and getting 2(N+1). That can be a big number!

So now that we have our n-grams for each sentence we want to look at transition probabilities between n-grams. The reason for this is that various parts of a sentence have unpredictable size. In the absense of a full NL parsing system there is no way to figure out what a syntactic unit (a noun phrase for instance) will be. This process completely obviates the need for an NL parser. This in itself is a huge win since NL parsing is at least difficult and probably impossible to do correctly because of idioms and variations in dialect. With the n-grams in hand we can now look at transition frequencies amoung the various n-grams in each of the different patterns in which they were combined. At this point we enter the information into a database which stores the transition probability between every two n-grams. Let us assume that we ignore sentences larger than 12 words. This means that we have 213 or 8192 words for a large sentence. This gives us 67,000,000 entries in our transition frequency matrix. O.K. So this is looking fairly intractable. If we decided that we will only look at correlations between neighbors and next neighbors however, we are back in the realm of possibility. This limitation has a certain justification beyond making things computationally feasible in that every element of the sentence will be a next nearest neighbor with an element of one of the n-gram sentences therby relating every possible syntactic unit. It should even be possible given this information to "guess" a parse based on our frequencies given a large enough corpus.

Stage 2 revolves around extracting information from a parallel corpus. We will simply perform a nearly identical procedure between two parallel corpuses.

When stage 1 and stage 2 are completed, we can use the probabilities of co-occurance from the parallel corpus in conjunction with the intra-language transition frequencies to generate "most probable" sentences.

We'll see how it goes.

Thursday, 30 March 2006

Cut Elimination

I was thinking about cut elimination on my way back from work today due in large part to a post on the cut rule made by sigfpe on A Neighborhood of Infinity.

It occured to me that the fact that cut-free proofs can be so tremendously much larger than the cut-full ones and that directly constructing cut-full proofs is so difficult is a bit strange. It seems somehow unfair.

As I was thinking about this I realised that one could find cut-free proofs automatically and then reduce them with reverse cut-elimination to produce extremely cutfull proofs. These proofs should be very economical computationally. If one keeps the terms that correspond with the proofs along side, one should be able to obtain a source to source translation that might have performance benefits.

Another totally far-out idea came to me as well. Mathematicians often use lemmas repeatedly. Perhaps the process of finding cuts is generally useful. Specifically, if a lemma is useful in simplifying one proof, maybe it is more likely to be useful in simplifying other proofs. There should be statistical measures over random syntactically valid sentences that one could come up with to see if such lemmas exist.

Sunday, 22 January 2006

A Theory of Science

As you might know, I like logics. Notice that logic is in the plural. This might seem a bit strange to you. It certainly seems a bit strange to me. How can it be that there is more than one?

There are quite a number of logics at this point. We pretty much started out with Aristotelian logic, ie. the logic of syllogism which you were probably forced to study in school. Aristotelian logic was formalised and generalised during the early part of last century and this formalism has come to be called Classical Logic or often just CL.

This formalisation caused some to view with suspicion the outcome of various formal arguments. It gave rise to a more conservative 'constructive' logic which we will call Intuisionistic (or IL) whose informal interpretation is known as the Brouwer-Heyting-Kalmagorov interpretation or BHK.

Basically in Classical logic you can make proofs about things for which you can not provide examples. This also happens however in Intuisionistic logic for arguments that use the ∀ quantifier. It doesn't seem so onerous in those cases however as you can see by playing with it a bit.

We can even make things more restrictive and get Minimal Logic (or ML). Minimal logic rejects the provability of arbitrary things from Falsum. The rule is often called 'ex falso sequitur quodlibet,' or 'ex falso.'

Since then there has been a real explosion of the types of logic. There are Substructural Logics, Quantum Logic (QL), Linear Logic (LL, a pretty big fish than can even swallow CL) and a host of others.

From this the question naturally arises in my mind. Which is the right logic? As someone who writes programming languages I have a natural sympathy for IL as it leads naturally to a term calculus meaning that terms can be given back to the user that exemplify proofs. It is a natural logic to look at for the purposes of a database query language. There are however problems with it in regards to this. It is not "resource sensitive". Things change in data stores and none of the above mentioned logics provide the appropriate tools to deal with this. Linear Logic comes closest but fails to deal with sharing or ordering. Many new resource logics have been invented to deal with this but I have yet to come across something that looks to me like a suitable answer (which doesn't mean it isn't already out there!).

In science the problem may be even worse. People use some form of quasi-classical reasoning to make arguments. It seems that this might not even be the appropriate tool to use when reasoning about Quantum Mechanics. Quantum Logic has been proposed as the appropriate way to deal with Quantum quandries in some (fairly fringe) circles. So far Quantum Logic looks to me to be too anemic. Something closer to a theory that has a curry howard isomorphism with quantum computation would be more satisfying.

So what is it that makes a good logic? My personal feeling is this. A logic is a constraint framework from whence you can show various programs that are the "proofs" of the constraint apparatus are acceptable. An appropriate constraint framework is one in which constraints that apply to your system can be expressed simply with minimal work. I believe that the Classical Logic for Propositions arose as a sort of logic of the natural sciences because it was in fact a type of physics. It is a calculus in which we can present common sense notions of real things in a simple way. When we extended the apparatus to classical logic we may have gotten something that strays so far from common sense it is no longer useful (this of course is debatable, and I'm not sure how much I believe it).

Now that we have a quantum world with physics that does not function in ways that our common sense would dictate, it seems perfectly reasonable to reject the notion of classical logic in this regime. In favor of what? I think the jury is still out on this one.

As for as how to quantify what a good logic is, I'll make a couple of guesses. You want to be able to express constraint systems that apply to your realm with parsimony. You want to be able to verify and extract programs from proofs. If those two conditions are met more often for one logic than another for a particular problem, then I would deem it superior.

Of course this doesn't even go into notions of logic in ethics...

Friday, 20 January 2006

Proof Theory

Thanks to my brother I got a really cool book on proof theory called "Basic Proof Theory". It has a bunch of nice features including a from the ground up presentation of proof theory that should be relatively accesible to anyone with a background in mathematics. It demonstrates some of the connections provided by the Curry-Howard correspondance (which is my favourite part of the book) . It also describe Second order logic, which is great because I've had very little formal exposure to this. Second order logic is really beautiful since you can define all the connectives in terms of ∀, ∀2 and →. If you pun ∀ and ∀2 you have a really compact notation.

The book also forced me to learn some things I hadn't wrapped my head around. One of those was Gentzen style sequent calculus. This really turns out to be pretty easy when you have a good book describing it. I've even wrote a little sequent solver (in lisp) since I found the proofs so much fun. The first order intuisionistic sequent solver is really not terribly difficult to write. Basically I treat the proofs as goal directed starting with a sequent of the form:

⇒ F

And try to arive at leaves of the tree that all have the form:

A ⇒ A

I have already proven that 'F ⇒ F' for compound formulas F from 'A ⇒ A' so I didn't figure it was neccessary to make the solver do it. The solver currently only works with propositional formula (it solves a type theory where types are not parameteric.) but I'm interested in limited extensions though I haven't thought much about that. I imagine I quickly get something undecidable if I'm not careful.

Anyhow working with the sequent calculus got me thinking about → In the book they present the rule for R→ as such


Γ,A ⇒ Δ,B
Γ ⇒ A→B,Δ



This is a bit weird since there is nothing that goes the other direction. ie. for non of: Minimal, Intuisionistic or Classical logic do you find a rule in which you introduce a connective in the left from formulas in the right. I started looking around for something that does this and I ran into Basic Logic. I haven't read the paper yet so I can't really comment on it. I'll let you know after I'm done with it.