Rubrication

Strategies for automatic translation

I've been hacking away at my program to test a theory I have about machine translation. I wrote a bit about it in a previous post but I was fairly vague. I thought I'd describe in more detail exactly how the technique would work (I'm still in phase 1). The idea is simple. The first phase is to take a corpus in a language. Take each sentence of the source (or some other sized chunk, currently I'm limited by computational tractability to a single sentence) and recombine each element of the sentence into every possible string of n-grams. If you play with it a bit you'll realise that there are 2 (N-1) of these for a string of size N. One way to think about it is that there are N-1 indexes into the spaces between words in the string. You can then think of each sentence as being a collection of indexes at which we combine words. This is obviously the power set of the set of indexes {1,2,3...N-1} and hence there are 2 (N-1) . It turns out however that it is ni

Rubrication

Search This Blog

Posts

Strategies for automatic translation