Language Log Twelve Years Later #5: Machine translation methods

My main linguistics essay for the 2015/16 fall term was about assorted methods of machine translation, and why they all suck (and why this is, sadly, pretty much an unavoidable problem, and I’m not sure how even humans manage).

Modern translation methods used, e.g., by Google are essentially statistical (figure out how this phrase was translated previously, then use that); this seems to be basically an early stage of this very thing. (Did Google Translate even exist by July 2003? I know its adoption of statistical methods was much later.)
[Apparently not; they started out in 2006, and adopted statistical methods in 2007. The big online translator as of July 30, 2003 was AltaVista BabelFish – which was actually already owned by Yahoo by then but not officially renamed until several years later.]

I wanted to post the modern Google translation of this Arabic text, but I could not find the original Arabic anywhere. (It seems to be a news article, apparently from the late 1990s, but I couldn’t find the original news article either.)

There shouldn’t be a comma in “will take off Wednesday morning”, incidentally.


Language Log Twelve Years Later #4: Four

As has been mentioned in the introductory post, classic Language Log article #4 is now a 404 page. I have no idea whether it had ever been anything else (I suppose I could check Wayback Machine, but I’m not sure it would work; most likely, it had been yet another test entry of some kind, and had been deleted within a few days of being posted).

So instead, I’m devoting this post to Piotr Gąsiorowski’s magnificent etymology of the Indo-European word for the number “four”.

To summarize, the theory is that the Proto-Indo-European root for “four” (which is otherwise unusually long for PIE) is morphologically derived from a root meaning “pair”; I’m not sufficiently familiar with PIE morphology to reasonably consider or dispute the specific details, but from a Russian-based perspective, he is saying that четыре “four” is related to чета “pair” and чётный “even” (something mentioned as a possibility in several Russian etymological dictionaries – but without actual details).

Most of the etymological dictonaries available online link the latter two words to South Slavic words meaning “troop, crowd”, then to считать “count” (compare чётки “prayer beads”, which does appear to derive from считать), then to читать “read”, and perhaps even farther on. Russian Wiktionary, in particular, doesn’t even try to decipher the mess, and gives several dozen assorted words (including all of the above except четыре “four”) in the “related words” tab.

However, one word that does – at least as far as I can tell – appear to derive from the root for “pair” as Gąsiorowski sees it (but does not appear to be mentioned in his posts on the topic) is сочетание “combination”. (And сочетать “(transitive) to combine”, and сочетаться “(intransitive) to combine, to fit, to marry”, which are trivially related to it.)
[Though the word *четать “to form pairs”, listed by Gąsiorowski as supposedly Russian, might be a confused reference to some version of the above; it is not otherwise a word I recognize.]

To be honest, I thought that it is a transparent calque from Latin combinatio (or, perhaps, from Greek syndyasmos). However, apparently there is a Bulgarian word съчетавам “to combine” (indeed, apparently, the Bulgarian for “combination” is съчетание), which is clearly related, but South Slavic.
This means that either it derives from the South Slavic root meaning “troop” – which doesn’t seem likely, given the Latin and Greek development, which is at worst parallel – or the root meaning “pair” had at some point existed in South Slavic (in which case the respective word might well have been, in fact, a calque – perhaps at the Old Church Slavonic stage).

I’m actually surprised that this word is not mentioned by Gąsiorowski (at least, not anywhere that I could find); there are otherwise no South Slavic cognates, which makes it easier to introduce spurious ones. (Though this doesn’t help Vasmer, whose entry for чета mentions both.)

In any case, even if it is unfalsifiable (or perhaps even unverifiable) at this point – great etymology.

(Yes, the link to Gąsiorowski’s blog was on purpose. I fully intend Gąsiorowski, or indeed his commenters, to see this post; even if they just end up picking out my linguistic mistakes. I’m hardly an expert; I’m just a random linguistically interested guy.)
(Also, sorry for such an extended digression. But I wanted an article #4 in the Language Log Twelve Years Later series, I really wanted to comment on this etymology, and there wasn’t really anything worthy of commenting on at the actual Language Log #4 page anyway.)

Language Log Twelve Years Later #3: Divorce by SMS

I suppose I might as well write a few more of these reviews. After all, there’s about five thousand to go 🙂

Post number 3 is a lucky case where a twelve-year-old link had not rotted. Which is a good thing, because the post in question is extremely brief, so it’s hard to figure out what’s going on otherwise unless you know a lot of relevant cultural background.

TL/DR: Islamic law is infamously unfair to women, so a man can divorce his wife very easily (but the opposite is nearly impossible). Technically, it is legal to divorce just by spoken declaration, but apparently modern courts have required a written message (perhaps due to difficulty of proving that a spoken declaration took place).
And a recent (as of when the article appeared on Language Log, so twelve year old now) Malaysian court case decided that, for this purpose, SMS messages count as written.

Which makes sense, I suppose, if you ignore that whole “unfair to women” part (I wonder what does modern Malaysian law – as of 2015 or so – say about the matter).

Incidentally, post 3 is one of the fairly few (mostly very early) Language Log posts that do not appear on the main menu (i.e. are not reachable by next/previous post links). The next post that appears on the main menu is number 5, the next one that does not is number 7; I’ll explain what is going on with 4 in Language Log Twelve Years Later #4, which should follow shortly.

Language Log Twelve Years Later #2: Own tongues

Not much to comment on here. I can’t even figure out what’s the reason this was discussed. To be honest, it almost looks like a follow-up to an earlier post on some other blog that perhaps did not survive to this day.
(Realistically, more likely, it is a follow-up to a private discussion, perhaps by email.)

Though I admit, out of context – and to an extent even in context – the phrase sounds funny.

As unfortunately common for classic Language Log, the link had since rotted away. I’ll check if I could find a Wayback Machine version; on the computer (and browser) I’m writing this from right now, Wayback Machine does not work anywhere near reliably (apparently, due to a particularly unfortunate law interaction, which I’ll try to explain more decently another time), so I’d have to search for it on a different computer (and/or in a different browser).
(I might also check if the information being linked to is actually still available, just at a subtly different address. It happens occasionally.)
However, for the most part, it’s fairly obvious what was going on from the quoted section, anyway.

(It’s officially the first Language Log post, incidentally; more details in the previous entry.)

Language Log Twelve Years Later #1: Testing

Testing, testing. Two, five, six, eight. Is that thing on?

Wait, did I say “two, five, six, eight”? I meant “one, two, three, four”.

The first four articles on Language Log that are reachable from the top menu are listed in the database under numbers 2, 5, 6, and 8, respectively. This early, most article numbers on Language Log do seem to correspond to actual articles (if quite brief ones), even if not in the top menu (later on, 99% or so of the numbers are reachable by the top menu, and the rest give 404 pages); I think one in the 30s in a draft of an article posted later. (I’ll try to mention which one if I get that far).

Number 4 is not available (at least, not right now); number 7 seems to be a test entry.

As is number 1, which I’m commenting on today, twelve and a bit years later. Um, the system is working, right?

(Incidentally, thanks to Language Hat for actually sending some people my way. I think up to then my only visitors had all been me from a different computer.
I didn’t actually realize that the link would produce a referral and be visible on the linked blog; I’m happy enough this time, but I’ll try to be more careful later.)

My version of the Three Laws of Fanfiction

(aka, why I dislike Methods of Rationality)

Actually, this ended up many times longer than the original; much of the important stuff is in the Corollaries. But I think that, apart for perhaps missing a few, I’ve managed to get pretty close to the laws of fanfiction that I go by.

(Though, since I wanted to keep things thematic to the original, my version of rule 3 is fairly silly.)


Rule One: If you want to increase the antagonist’s power, you should also amplify the protagonist so that they could still matter. You can’t give Sauron stormtroopers* unless you, at least, give Frodo a lightsaber. Either that, or say at the outset that the bad guys win, so that the readers know it’s that kind of story; few people want to read a story where the main character is ludicrously weak compared to the evil guy but wins by a string of freak coincidences and/or sheer ass-pullery. (Unless it’s comedy. You pretty much get the same sort of freak coincidences and sheer ass-pullery, but it’s slightly more permissible in explicit comedy.) The Deus Ex Machina is defined not by the direct strength, but by how much it makes the main character’s situation easier.

Corollary 1 to Rule One: If your villains are powerful and competent, make the heroes at least as powerful (or slightly less powerful but far more competent, if you’re sure you can write very smart characters). Otherwise, you would need to resort to ass-pullery to explain why the villains did not win; and, as already mentioned, few want to read a story that mostly consists of ass-pullery.

Corollary 2 to Rule One: Actually, Sauron is at the upper edge of permissible hero/villain strength difference (perhaps beyond it by modern standards), so you might as well give Frodo a lightsaber directly. (Just don’t forget to include a decent explanation why.)

Corollary 3 to Rule One: This does not mean that you should increase the protagonist’s power endlessly. If your protagonist is too powerful, they end up beating all the existing challenges very easily, and you have to make up some new challenges for them to overcome. (Which is easier than it sounds when it comes to fanfiction – just find a good powerful setting to cross over.) I’ve read a few fanfics that had that problem (in particular, many of the Worm CYOAs end up that way); it can quite easily happen if you overdo escalation (and/or if the villains in the base setting are weak enough, but that is rare nowadays). However, it certainly requires a lot more than making Frodo a Jedi, or giving him a lightsaber.
(And even then, you can still have a good story if you do it slice-of-life style; just look at Heromaker’s Legacy. Or at Commander [by Drich]. Or at the MLP Loops, for that matter.)

Corollary 4 to Rule One: Just in case, yes, you can make the antagonists stronger; just do it carefully, and don’t forget to keep the good guys strong enough to be a challenge. In fact – compare corollary 3 – if your protagonists are strengthened enough, you might actually need to make the bad guys also a bit stronger; this is why most strong!Harry stories also have manipulative!Dumbledore (who then works as a sufficiently major antagonist to Harry – often still fairly easily beaten, mind you, but it helps the authors avoid ending the story too quickly).
And this didn’t really fit in any other corollary, and isn’t really worth its own, so I’m putting it here: the problem with Mary Sues is bad writing. A well-written Mary Sue might not even be recognizable as one, despite being just as powerful; and many of the “common Mary Sue tropes” depend on bad writing directly. I’m pretty sure Mary Poppins or Pippi Longstocking would get an awful lot of points on a Mary Sue test if considered fairly.


Rule Two: Originality is possible, if you’re very careful, but it wouldn’t be very well liked by your readers. Do not attempt to be fully original; pretty much everything had been done before. Instead, go with what is popular, but add your own little twist on top on it, or go from the common start in your own way. (The surprisingly different answers for some quite detailed fanfiction challenges – Severitus comes to mind – are a good case study.)
And certainly do not try to change canon events without explanation (though you can do it if you explain why – and the “why” might well be story balance). Your readers want to read a fanfic, not original fiction that happens to have the same character names. If nothing happened yet in your story that would necessarily change the Triwizard challenges, use the same ones – but have your version of Harry (or whatever Triwizard participant your story has) solve them differently.

Corollary 1 to Rule Two: Do not assume that your crossover idea had not been done before; and, as far as the pair of settings goes, don’t necessarily try to find one that wasn’t. Chances are, even if you do find one, either one or both of the settings involved are far too obscure to your intended audience, or it really doesn’t work well as a crossover. (And very possibly it had been done anyway, especially if one of the settings is very popular.) Though it’s possible to do well in the former case (one of the settings is obscure) if you explain the situation carefully.

Corollary 2 to Rule Two: If you actually try to be fully original, you risk intruding into well-travelled absurdist territory. There are only so many separatable varieties of “dyr bul schyl, ubeschur”. (And no one but absurdism fans will want to read that sort of stuff anyway.)

Corollary 3 to Rule Two: In less explored fandoms (not Harry Potter), you can occasionally find a nice idea that really hadn’t been done before. (For especially less explored fandoms, you might find out that there are almost no fanfics in that setting at all. Ever seen a Gargantua and Pantagruel fanfic? Me neither.) If so, by all right, go on. Just don’t be angry if someone else happens to have the same idea later – at least, as long as they do the further details differently, and do not plagiarize you directly.

Corollary 4 to Rule Two: This does not mean you should hold to the stations of canon. It makes little sense for your Harry to go to Diagon Alley by himself and stumble on Draco anyway**. But if something is not specifically tied to a precise location and/or timing, and there is no immediate reason for it to have changed in your story, keep it the same, at least as far as the non-protagonist part goes; and by all means, think up your own stuff for periods where hardly anything happened in the original story.

Corollary 5 to Rule Two: Backgrounds. Unless it is very important to your plot (or unless you started your story without being aware of some of these details, in which case perhaps the best choice is to explain and continue on), keep your characters’ backgrounds (and history and such) reasonably canon. (With some variation for the more recent and/or less popular bits – I don’t have a problem with stories where Charlus Potter is Harry’s grandfather.) Writing AU for the sake of AU is silly.


Rule Three: The premise of a story is just that – what happens in your story. (Or, at least, how your story starts out.) A story always needs a premise, and preferably a coherent one; then you need to develop what happens from that, or how it occurs. Think of a premise as a fanfic challenge you’re writing for; except in most cases you’re setting this challenge yourself.
“What happens if the Terminator is sent back in time to kill Voldemort” is a nice example of a premise, and by itself almost works well enough for a fanfic challenge (though an actual challenge would probably snip the “what happens if” part); to turn it into a story, however, you need to figure out what does, in fact, happen, as well as some other details. (Why is there a Terminator involved in the first place? How does a Terminator react to a Cutting Curse? When in the Potterverse timeline this happens? Which kind of Terminator? There are probably dozens of other similar questions one might add to make this premise into a story. I’d like to read such a story, actually.)

Corollary 1 to Rule Three: Yes, this does very much darn mean that a story isn’t necessarily about a conflict. Especially not about a singular conflict, which the original seems to imply. Most of the good stories are about the heroes, and the conflict only comes up as the occasional problem on the way. (And often there’s no real defined goal either – just lots of slice-of-life goodness).

Corollary 2 to Rule Three: No, you do not need an ending planned out, at least not in detail; in fact, even if you do, prepare to have it all slashed and redone as readers pick out plot holes in your story – or even as your own characters get in the way. Some of the best stories were written by the authors thinking: “Now, what would this or that character do? Now, what would logically happen after this or that thing?” Once you’re done with the starting premise, let the plot flow naturally (with perhaps the occasional careful adjustment if your bad guys get too much of an upper hand), and you’re much more likely to have a good story.
Actually, this is much of the reason for Rule One: if your villains are too much more powerful than your heroes, your plot needs to be very rigid and constrained for the heroes to end up winning. This sort of story holds up very poorly to plot holes (even minor ones if it’s rigid enough), or especially to characters wanting to go their own way.

Corollary 3 to Rule Three: It’s a good idea to think of your fanfic as an alternate history. Start with the Point of Departure – a small thing that changes compared to canon (use two or three separate small things if you really need to), then follow the knock-on effects from there. If needed, add a bit of random butterfly effect – to whatever side you want – when enough time had passed. You don’t even need to finish with a proper conclusion, as long as you have a good story (though the readers will probably prefer if you pick out at least some ending note, and declare it “End of Part 1”).

Corollary 4 to Rule Three: And not really on-topic (aside from picking at the original), but please, minimize the severe obstacles. I know they’re all the rage in modern “young adult” fiction (what sort of age range is “young adult” anyway? 18-21?), but they mostly just make for a depressing story. Real life is depressing enough thank you very much. It’s possible to focus on “gritty” without being extremely depressing – Brutal Harry comes to mind – but the “severe obstacles” thing, when not done very well (I can’t even think of a good example that isn’t directly based on real life), just makes a story start out sad and only get darker, with the occasional relatively happy ending slapped on at the end. (This is why I didn’t try to read Worm, incidentally.)


So if you have a lovely mental image of Frodo with a lightsaber…

  1. How will the setting change due to that? Does Saruman’s army now have stormtroopers, or droids, or AT-ATs? (Or, if you don’t want to make the bad guys too strong, how about Faramir’s army?)
  2. How did Frodo end up with a lightsaber in the first place? Did a Jedi ship crashland on Arda? If so, what could other effects of that be? (For that matter, are other Jedi ships on the way?)
  3. Now, what would Frodo do when he ends up with a lightsaber? What would Gandalf, or Sam, or Elrond do when he sees Frodo with a lightsaber? Would Saruman stay on his canon course, escalate on his own, or join Frodo because he is now stronger? If the latter, would all of the above accept having an army of Uruk-Hai on their side? (And loads of other similar questions, depending on what path you choose for the story.)

If written well enough, this can likely become a very nice story (not that I’ve read one with that exact premise) – even without Sauron being advantaged directly.



*) Not a Death Star though; what the triangular heck would Sauron do with a Death Star? He only has the one planet, anyway.

**) My original example was the Lung fight in Worm fanfics, which is both far more common (in relative terms), and far more ridiculous in terms of timing. But I decided to use an example more appropriate for the original – that is to say, from Harry Potter.

Post 210, incidentally. (Also over 2000 words. Which is a lot. The original was about ten times smaller.)