On names and collation

Computers are wonderfully efficient at simple and mundane tasks, such as sorting lists of things. However, the most basic sorting done is the comparison of the character on a byte-by-byte basis. If we go back to the old days of ascii, this means that you end up with a case sensitive sorting, where capitals come ahead of lower case.

You then go to linux, and you start doing ls, and it ends up showing the list of filenames in a case insensitive manner and you go … that’s nice. You then go to OSX and do an ls, and it ends up showing the list of files in a case sensitive manner and you go – dude, not nice. Turns out that OSX’s libc uses la_LN.US-ASCII for collation on all en_ locales – i.e. plain old ascii.

Finder does not use this sorting, It’s sorting is done by sorting the names using the routine UCCompareTextDefault, with options that allow you to specify case insensitivity, and treating numbers as numbers, as well as some others. It’s pretty fancy.

However I’m talking about names. There are a wide variety of rules related to names, and I’ve had to do some odd stuff in my past.

The first thing I’ve been asked, is to omit the ‘The’ from collation – e.g. titles containing The as the start are instead sorted by the second word – so, for example The Last Supper would be listed under L, instead of T. Very much how you find it in the library under Last Supper, The.

Then there are Irish names. Please sort in library order was the request.

I had no idea how much of a rathole this was

First – accents or fadas, as we call them come after the letter of the same character so a, á, i, í, etc

Then stem the surname, for the most part, so Ó Loingsigh is sorted under L. There are a lot of surname prefixes in Irish – De, Fitz, Ó, Uí, Ní, Nic, Mac, Mc, Mag, Mhig, Nig, Mac Giolla, Ua – oh my! In general, you’re not supposed to collate under the prefix; except when it’s a Mac or Mc – they’re generally considered sorted as if Mc and Mac are the same. so McCarthy, MacLysaght…

I’ve been trying to track down the article that I used as a basis for this, but it appears to be based on an article by O’Deirg, — “Her infinite variety” – on the ordering of Irish surnames with prefixes, especially those of women.’ An Leabharlann 10 (1), 14–16. The best source reference to this I’ve been able to track down is an article by Róisín Nic Cóil, titled Irish prefixes and the alphabetization of personal names, which contains the common practices. They’re a lot lazier than the requested ordering I was asked to use, which is more in line with O’Deirg’s work.

Which brings me back to the original reason for writing this article — someone commented that all the books in their collection that started with the word The were sorted into the pool of books starting with T, rather than with their second name. Rather than um actually’ing the conversation, I decided to share my experience in the area with others.

Sorting/Collation … like date and time handling is complicated if you want to do it ‘right’, and right depends on where you’re asking from.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.