Rounded Pi Day

To be honest, I kind of completely forgot this was even a thing. Or didn’t know in the first place.

Either way, I had no idea it was going on until a few minutes ago (when the day was nearly ending).

I entirely agree that having 3/14/16 (or 3/14/15) as the Pi Day is horribly American-centric, but it just so happens that 31/4/16 (or 31/4 anything else) is just barely not a valid calendar date under the European DMY format. April does not have 31 days. Sorry.

However, 31/4/16 (or indeed 31/4/15) is a totally valid date under YMD format, which is what everyone should be using anyway because it’s the most mathematically logical (though to be fully mathematically logical, it should have been 31/04/16, and that doesn’t really work).

So I’ll set an alarm to April 15, 2031, to do something nice on 9:26:53 AM. (And again on 9:29:20 AM, to celebrate the famous approximation.) I was planning to write some letters to the early 2030s anyway.

(On April 15, 2031, my younger brother will be several months older than I am today. Which is extremely weird to think about. I wonder if he’d have a job by then… or a wife, for that matter.)


My opinion on self-pingbacks

Apparently pingbacks (basically, links to your post from other blog posts) count as comments, and (at least, as far as representation on the post’s page is involved) go in the same comment-approval queue. I hadn’t yet seen enough of either to figure out whether they are mixed in with the actual comments or go separately on the post’s actual page (I’ve seen blogs that do either), but either way it’s a bit weird (though a separate listing is much less weird than a mixed one).

But then there’s self-pingbacks. (In other words, when your post is linked from your own later post.) I can kind of understand why the underlying WordPress software might have problems telling these things from actual pingbacks originating from other places, but they still don’t really have place among regular comments (or even regular pingbacks).

(Yes, there are blogs that approve self-pingbacks. It actually seems to be rarer for a blog to approve regular pingbacks but not self-pingbacks; Language Hat is about the only one I’ve seen.)

So far, the best solution I could find is to directly unapprove self-pingbacks while accepting everything else (to be revised when said “everything else” is not an empty set). I’d be happy to find out if there’s something better (that doesn’t require me to pay).


On a side note, I’ve been told by a reader of my blog (yes, I have readers now! yeah! though to be fair, I told him personally, so that might not really count) that my Language Log commentary posts are unintelligible without the corresponding original.

So they are. It’s a bit unfortunate, but should have been expected, because, to be honest, I never intended anyone to read them without being aware of what Language Log is (though even that might not have helped).

For the record, if you happen to know where current Language Log is, the classic posts are available at the “posts before 2008 here” link in the right tab; then click at any post in that right tab, and change the number in the resulting URL to whatever you’re searching for.

Or just follow the 404 link in my commentary post #4 and change the number from there. That might actually be easier.

YouTube’s most watched videos – a timeline

I’m a bit surprised that nobody so far had managed to produce a timeline – even an approximate one – of the most watched videos on YouTube. (Or nobody that I could find, anyway.)

I mean, there’s plenty of data for “most viewed videos as of 20xx” (at least, in the last few years – it’s less common earlier on), but nothing about what the progression of records was. (There are a few links on “most viewed videos uploaded in 20xx”, but they’re not exactly the right thing here either.) So now the most watched video is Gangnam Style (okay, everyone knows that), and it usurped the record from what, exactly?

I’ve tried to assemble what little I could find (there are sadly many gaps in the data) to figure out the chronology. The specific dates, in particular, may be a day or two off (though not for numbers 1-4, whose dates are attestation limits, straight from the Wayback Machine).


1. JoeB, Cross Bar (feat. Ronaldinho)

This video is a Nike commercial, and is highly suspected to have been staged; it’s unclear whether the original uploader – back in October 2005 – realized this, though! It also happens to be the first video to reach 1000000 (that’s a million) views on YouTube.

It probably was the most viewed video on YouTube when the site was launched officially. Earlier data is unlikely to be available, so let’s just start that way.

(Note: there is some confusion with another video, Touch of Gold, with the same commercial. But it was Cross Bar that was the first to a million, as this Wayback Machine link proves.)

Date when most viewed: as of December 10-20, 2005


2. mugenized, “Jay Leno Phony Photo Booth”

No idea what this video could have been. It’s not available today.

Date when most viewed: as of February 14, 2006 (by which point Cross Bar was #14)


3. eggtea, “Myspace – THE MOVIE!”

Not available today either.

Date when most viewed: as of March 3, 2006


4. smosh, “Pokemon Theme Music Video”

Also unavailable (at least this time I know why – for copyright reasons). First to 10 million.

Date when most viewed: as of April 9 – May 2, 2006


5. Judson Laipply, Evolution of Dance (feat. Judson Laipply)

This used to be the classic “most viewed” YouTube video, having held this position for years. Now it’s ancient history.

The numbering of this one (and further) is completely useless. There are reasons to suspect that many other videos, not listed here, have held the top spot between #1 and this, but aside from those listed here, I was unable to find any stated definitely.

Date when most viewed: May 2006 – October 31, 2009 (but see below)


6. RCARecords, Girlfriend (feat. Avril Lavigne)

I could not figure out when, exactly, this was the most viewed. It definitely was at some point; the period listed is my best guess, and might not have been continuous.

I really only have the Know Your Meme page (for Dance Evolution, that is) to thank for including this video in this list in the first place, actually.

Also a music video (for comparison, today’s YouTube top 10 is all music videos).

Link is not to the original video (the original had since been taken down).

Might or might not have only had the top spot due to view manipulation. (If so, the reason I’m not including the brief top video from March 2008 – which definitely was due to manipulation – is because it had a NSFW title. Search for it yourself.)

Date when most viewed: August 2008 – April 2009


7. HDCYT, Charlie bit my finger (feat. Harry Davies-Carr)

Exactly what it says on the tin – a video of a baby boy named Charlie biting his (slightly) older brother’s finger. (Harry Davies-Carr is the name of said older brother.)

Was the most viewed non-music video for a good deal longer. Still the third (perhaps second, depending on what you count) most viewed in that category today (though only #34 overall… the other 31 are all music videos).

I’m surprised that it actually got that far.

Date when most viewed: October 31, 2009 – April 14, 2010


8. Lady Gaga, Bad Romance (feat. Lady Gaga)

Another example on a video I only stumbled on when already writing up this list. It is often said that the next one (you’ll see) took the top spot directly from the previous one (Charlie); this was not the case apparently.

And another music video, naturally. (Not much of a spoiler, but all the following top positions also are music videos.)

Date when most viewed: April 14 – July 15, 2010


9. Justin Bieber, Baby (feat. Ludacris?)

The second to get to a billion views, this video was still second most viewed until less than five months ago (it’s sixth as I write this, on March 10, 2016, and might well be seventh by March 11th).

It also, for a long time, held the YouTube record for “most hated video” (either most negative score, or most downvotes; not sure exactly, and might well have been both).

Which makes sense, I suppose, because, after all, it’s Justin Bieber.

Date when most viewed: July 15, 2010 – November 24, 2012


10. Psy, Gangnam Style (feat. Psy)

Technically a music video, this isn’t exactly in the typical music video tradition, which is probably to the better.

First YouTube video to reach a billion views (compare Cross Bar’s million); still the only to reach two billion. Recently reached two and a half billion. Might well be the first to reach three billion too – time will tell eventually.

Date when most viewed: November 24, 2012 – present (and still going)


Um, that’s it? I’m sure I didn’t make it a top 10 on purpose. My original list had 7, anyway (I only added the three 2006 videos at the last moment).

Comments and especially additions welcome. Though I doubt there will be any additions unless someone manages to discover an early 2006 article about the most viewed videos (all of those I found are about viral hits, and don’t necessarily call them most viewed).

A further rant about fanfics

This is fairly tangential to my previously posted version of the Three Laws of Fanfiction, and I did not entirely realize it when I was writing them anyway.

It definitely deserves being stated, however.

Why the triangular heck does (what feels like) pretty much every single fanfic (and a good part of original fiction, for that matter) have several major good-side characters killed by the evil guys a few chapters before the ending?

I mean, I can actually kind of understand why – because a climactic fight needs to be climactic, to get the emotions flowing, stuff like that. And it also usually comes at the point (around 90% of the way in) where it’s a bit late to throw the book away in disgust (even metaphorically), and most readers prefer to just slog through to the end.

But it sure erases pretty much all of the enjoyment gained by the previous 90% of the text (and it is hard to get much of that enjoyment back in the remaining 10%) – because, unless the story was dark from the start (more common with original works than with fanfics), it means that the tone of the story changes significantly.
(If it was dark from the start, one at least expects something like that, and is less surprised by the sudden downer. But stories like that just tend to be thrown away in disgust earlier by anyone who isn’t that kind of reader.)

It’s kinda possible to deal with that by making it gradual – things very slowly get darker and darker. Harry Potter (the original, not the fanfics) is a good example. But, again, it requires a particular kind of reader, and it makes it even harder to make a non-obviously-sad ending if things have been sad for so long already (at least, other than actively reverting stuff to the good side – often including the above-mentioned killed characters going back to life – which just feels like a cop-out, and if not done very well can also give the impression that half the things that happened were all for nothing).

Why this is related to the Three Laws of Fanfiction at all? Because this is one of the biggest problems about having powerful, competent villains. Even if your hero is, in fact, powerful enough to beat them, it just makes sense for them to, say, abduct the hero’s favorite girl (oh, I forgot to mention, it’s usually a girl for some reason – can occasionally be a boy if the hero is female, but even then it’s often a girl) and kill her. (Or, as often happens, corrupt her so much that the hero is forced to kill her, which is, if anything, even worse for the reader.)

This is why Sauron is so acceptable despite the power disparity: he never tries to attack Sam’s girlfriend or something like that… actually, he, or Saruman for that matter, never really tried to attack anyone‘s friends directly. (To be fair, it is in part because of the power disparity, but still.)

I agree, that’s a bit of a problem. If you write about the villains abducting (or just normally murdering) the hero’s favorite girl, your readers will not enjoy the story; if you do not, you’ll keep wondering why they didn’t do that when it was the obvious and reasonable thing to do (as will many readers, even if they realize that it would’ve made the story much sadder).

So either don’t write about villains competent enough to realize that, or – that’s probably easier, because extremely dumb villains are almost impossible to write about without falling into bad humor – make both your main heroes and the sidekicks powerful enough that even if a villain tries to abduct one, they’d just get a facefull of sidekick fist (or, at worst, a really angry sidekick who managed to escape).

Or maybe just don’t have the hero keep major attachments; make it so that the protagonist’s side consists pretty much of the protagonist only. This is surprisingly easy to do in settings where the main character already has reasons not to trust people; in particular, many Worm fanfics end up going that way.

(On second thought, especially competent villains might realize that hurting the hero’s friends will only anger the hero further. But at this point, it’s hard to make the hero powerful enough to be anything but a slight nuisance to the villains’ brilliant master plan. And on some particular intelligence levels, they might just still do it anyway, to make the hero act irrationally.)


…Sorry for such an extended rant; I probably don’t even seriously believe in half the things I just said. But I really wanted to vent, after yet another fanfic I wanted to enjoy that had the hero’s (female) best friend killed two chapters before the end of the story.

Language Log Twelve Years Later #5: Machine translation methods

My main linguistics essay for the 2015/16 fall term was about assorted methods of machine translation, and why they all suck (and why this is, sadly, pretty much an unavoidable problem, and I’m not sure how even humans manage).

Modern translation methods used, e.g., by Google are essentially statistical (figure out how this phrase was translated previously, then use that); this seems to be basically an early stage of this very thing. (Did Google Translate even exist by July 2003? I know its adoption of statistical methods was much later.)
[Apparently not; they started out in 2006, and adopted statistical methods in 2007. The big online translator as of July 30, 2003 was AltaVista BabelFish – which was actually already owned by Yahoo by then but not officially renamed until several years later.]

I wanted to post the modern Google translation of this Arabic text, but I could not find the original Arabic anywhere. (It seems to be a news article, apparently from the late 1990s, but I couldn’t find the original news article either.)

There shouldn’t be a comma in “will take off Wednesday morning”, incidentally.

Language Log Twelve Years Later #4: Four

As has been mentioned in the introductory post, classic Language Log article #4 is now a 404 page. I have no idea whether it had ever been anything else (I suppose I could check Wayback Machine, but I’m not sure it would work; most likely, it had been yet another test entry of some kind, and had been deleted within a few days of being posted).

So instead, I’m devoting this post to Piotr Gąsiorowski’s magnificent etymology of the Indo-European word for the number “four”.

To summarize, the theory is that the Proto-Indo-European root for “four” (which is otherwise unusually long for PIE) is morphologically derived from a root meaning “pair”; I’m not sufficiently familiar with PIE morphology to reasonably consider or dispute the specific details, but from a Russian-based perspective, he is saying that четыре “four” is related to чета “pair” and чётный “even” (something mentioned as a possibility in several Russian etymological dictionaries – but without actual details).

Most of the etymological dictonaries available online link the latter two words to South Slavic words meaning “troop, crowd”, then to считать “count” (compare чётки “prayer beads”, which does appear to derive from считать), then to читать “read”, and perhaps even farther on. Russian Wiktionary, in particular, doesn’t even try to decipher the mess, and gives several dozen assorted words (including all of the above except четыре “four”) in the “related words” tab.

However, one word that does – at least as far as I can tell – appear to derive from the root for “pair” as Gąsiorowski sees it (but does not appear to be mentioned in his posts on the topic) is сочетание “combination”. (And сочетать “(transitive) to combine”, and сочетаться “(intransitive) to combine, to fit, to marry”, which are trivially related to it.)
[Though the word *четать “to form pairs”, listed by Gąsiorowski as supposedly Russian, might be a confused reference to some version of the above; it is not otherwise a word I recognize.]

To be honest, I thought that it is a transparent calque from Latin combinatio (or, perhaps, from Greek syndyasmos). However, apparently there is a Bulgarian word съчетавам “to combine” (indeed, apparently, the Bulgarian for “combination” is съчетание), which is clearly related, but South Slavic.
This means that either it derives from the South Slavic root meaning “troop” – which doesn’t seem likely, given the Latin and Greek development, which is at worst parallel – or the root meaning “pair” had at some point existed in South Slavic (in which case the respective word might well have been, in fact, a calque – perhaps at the Old Church Slavonic stage).

I’m actually surprised that this word is not mentioned by Gąsiorowski (at least, not anywhere that I could find); there are otherwise no South Slavic cognates, which makes it easier to introduce spurious ones. (Though this doesn’t help Vasmer, whose entry for чета mentions both.)

In any case, even if it is unfalsifiable (or perhaps even unverifiable) at this point – great etymology.

(Yes, the link to Gąsiorowski’s blog was on purpose. I fully intend Gąsiorowski, or indeed his commenters, to see this post; even if they just end up picking out my linguistic mistakes. I’m hardly an expert; I’m just a random linguistically interested guy.)
(Also, sorry for such an extended digression. But I wanted an article #4 in the Language Log Twelve Years Later series, I really wanted to comment on this etymology, and there wasn’t really anything worthy of commenting on at the actual Language Log #4 page anyway.)

Language Log Twelve Years Later #3: Divorce by SMS

I suppose I might as well write a few more of these reviews. After all, there’s about five thousand to go 🙂

Post number 3 is a lucky case where a twelve-year-old link had not rotted. Which is a good thing, because the post in question is extremely brief, so it’s hard to figure out what’s going on otherwise unless you know a lot of relevant cultural background.

TL/DR: Islamic law is infamously unfair to women, so a man can divorce his wife very easily (but the opposite is nearly impossible). Technically, it is legal to divorce just by spoken declaration, but apparently modern courts have required a written message (perhaps due to difficulty of proving that a spoken declaration took place).
And a recent (as of when the article appeared on Language Log, so twelve year old now) Malaysian court case decided that, for this purpose, SMS messages count as written.

Which makes sense, I suppose, if you ignore that whole “unfair to women” part (I wonder what does modern Malaysian law – as of 2015 or so – say about the matter).

Incidentally, post 3 is one of the fairly few (mostly very early) Language Log posts that do not appear on the main menu (i.e. are not reachable by next/previous post links). The next post that appears on the main menu is number 5, the next one that does not is number 7; I’ll explain what is going on with 4 in Language Log Twelve Years Later #4, which should follow shortly.