On names and collation

Computers are wonderfully efficient at simple and mundane tasks, such as sorting lists of things. However, the most basic sorting done is the comparison of the character on a byte-by-byte basis. If we go back to the old days of ascii, this means that you end up with a case sensitive sorting, where capitals come ahead of lower case.

You then go to linux, and you start doing ls, and it ends up showing the list of filenames in a case insensitive manner and you go … that’s nice. You then go to OSX and do an ls, and it ends up showing the list of files in a case sensitive manner and you go – dude, not nice. Turns out that OSX’s libc uses la_LN.US-ASCII for collation on all en_ locales – i.e. plain old ascii.

Finder does not use this sorting, It’s sorting is done by sorting the names using the routine UCCompareTextDefault, with options that allow you to specify case insensitivity, and treating numbers as numbers, as well as some others. It’s pretty fancy.

However I’m talking about names. There are a wide variety of rules related to names, and I’ve had to do some odd stuff in my past.

The first thing I’ve been asked, is to omit the ‘The’ from collation – e.g. titles containing The as the start are instead sorted by the second word – so, for example The Last Supper would be listed under L, instead of T. Very much how you find it in the library under Last Supper, The.

Then there are Irish names. Please sort in library order was the request.

I had no idea how much of a rathole this was

First – accents or fadas, as we call them come after the letter of the same character so a, á, i, í, etc

Then stem the surname, for the most part, so Ó Loingsigh is sorted under L. There are a lot of surname prefixes in Irish – De, Fitz, Ó, Uí, Ní, Nic, Mac, Mc, Mag, Mhig, Nig, Mac Giolla, Ua – oh my! In general, you’re not supposed to collate under the prefix; except when it’s a Mac or Mc – they’re generally considered sorted as if Mc and Mac are the same. so McCarthy, MacLysaght…

I’ve been trying to track down the article that I used as a basis for this, but it appears to be based on an article by O’Deirg, — “Her infinite variety” – on the ordering of Irish surnames with prefixes, especially those of women.’ An Leabharlann 10 (1), 14–16. The best source reference to this I’ve been able to track down is an article by Róisín Nic Cóil, titled Irish prefixes and the alphabetization of personal names, which contains the common practices. They’re a lot lazier than the requested ordering I was asked to use, which is more in line with O’Deirg’s work.

Which brings me back to the original reason for writing this article — someone commented that all the books in their collection that started with the word The were sorted into the pool of books starting with T, rather than with their second name. Rather than um actually’ing the conversation, I decided to share my experience in the area with others.

Sorting/Collation … like date and time handling is complicated if you want to do it ‘right’, and right depends on where you’re asking from.

The dangers of upgrading

One of the guys in the office was having trouble since the upgrade to python3.6 – a bunch of test code was breaking when run under the debugger.

The issue seemed to be related to the use of pexpect 4.6 in the new environment as opposed to pexpect 3.3 in the old environment.

This is made very difficult to debug as the ptyprocess code closes all the file descriptors of the child process before attempting to exec. As a result random exceptions were being swallowed, the entire thing was crashing out, and the code was locking up in an error read loop.

I hemmed and hawed about downgrading to the 3.3 version of pexpect, but decided to investigate further, rather than leave the problem as is.

Addressing the debugging problem involved replacing all the code that closed all the file descriptors with code that marked all the file descriptors as close on exec, so that when I saw the exception, I was able to deal with it. This was done in the ptyprocess module. The solution is linux only, but TBH at this time it’s all I’m concerned with.

Addressing the pexpect problem involved just removing the code that re-encoded the arguments when the encoding argument was passed, and just leave them as-is.

The confusion is because encoding was for the I/O, not for the arguments on the command line, and when the change was made it relied on this argument, rather than adding an extra argument to deal with it.

Fixes the problem in my case, but it was a complete pain to debug.

Hope in every box

Imagine if life stretched out in a single span from birth to death, and all you have is a long stretched out span between the start and then end, with no pause in between.

It would be absolute hell on earth. How could you bear to survive in a world like that? A never ending stretch until the precious final release of death.

However it’s not that case — life is, instead, broken down into little boxes.

Each box is a day. Each box is separate and distinct. Sometimes when you’re in a box it seems like that’s all there is and there’s no way out. Sometimes you look at another box and think that it’s an impossible goal because it looks so difficult to get to as you don’t have the skills or experience to get there.

The trick is though, every day you have choices as to where you want to go — there are exits to the box doing in different directions. Some of the directions are positive, and some of the directions are negative. Sometimes, you may have slipped so far into the negative that you cannot conceive of getting to the positive.

However, the thing about the individual boxes is that you only need to deal with the situation one box at a time.

Whether the box is getting practice in your writing,

or the box is managing to to another day without a drink,

or the box is learning more about yourself…

It’s only a small box that you need to get through; you’re not trying to deal with the entirety of your life, you’re just trying to deal with this one small thing called now.

An important thing to remember, though, is that you don’t always have to ‘make progress’. Sometimes the only thing you can do is just get through that single box. The best thing about all these individual boxes is that you get the opportunity to try again the next time. Because no matter how bad it seems to be at this moment in time, you will have another opportunity the next time as long as you get the fact that there will be another time. It’s just a single box away.

We need to aim for progress, not perfection; one day at a time.

Originating Video

(Floating point) math is hard

On websites like StackOverflow, if you post something about floating point math, it will get closed very quickly as a duplicate of the ‘is floating point broken’ question. Very often it is a duplicate of that question, however there are occasions when this is not the case.

The general question is typically how come 1.1 + 2.2 != 3.3?. There are a lot of resources about this. The long and the short of it is that binary representations of floating point numbers are not the actual numbers, but close approximations to them, so as a result equality sometimes isn’t, and you end up having to do ‘fudge’ math (a == b becomes fabs(a – b) < fudge). It’s great fun (and I mean that in the most sarcastic manner possible).

Sometimes, it’s about the display of floating point numbers, and it’s really frustrating when the question gets closed for the wrong reason. This can be especially frustrating for the OP, who may be new to the site and gets a poor impression from this.

The issue, in this case was about missing precision on the display of java floating point values.

In java the code:

float f = 2.0f / 3.0f;

Will only display 7 decimal places. This is because it’s a 32bit float, and can only represent up to that level of precision.

The code:

double d = 2.0 / 3.0;

Will display 15 decimal places. This is because it’s a 64bit float, which can represent up to that level of precision.

So, if you want 15 decimal places in the default number math, you need to use the double.

This is all moot if you’re dealing with money. I really do hope you’re not using float to represent money. That’s how you get an audit. In the words of a great philosopher, that’s a German car, the ‘T’ is silent 🙂

On walled gardens

I recently downloaded an application for doing ‘time since’ tracking. Tracking these stats is not something I would associate with cloud based services – I would anticipate the data being on-device, and not leaving unless you explicitly chose to export it.

The first thing I encounter is a login screen. It asks for an email and password, and if I don’t have an account, it asks to create one; obviously requiring an email and password.

Closed app,

Deleted from device.

I have no way of interacting with the app without an account, as a result I don’t know if it’s good or bad, all I know is that it’s asking for some PII, and I’m not willing to take the chance that it’s trustworthy.

Security – get the facts

This is similar to the slippery slope of drugs and alcohol for addicts – somewhat humorous (from @thegrugq):

“I used to use 1024bit keys, then my friends switched to 2048, I felt I had to as well. Now I’m using 4096. Everyday”


The pointless dialog

We’ve all seen them. The dialog with the ok button that may as well say I am wasting your time for all the good it provides.


These are a class of dialog that deserves to be burned with fire. They provide absolutely no benefit to the user and contribute nothing but noise.

The iOS human interfaces guidelines on this are quite simple in regards to using alert views like this. The one-paragraph guideline for using them says:

Reserve alerts for delivering essential—and ideally actionable—information. An alert interrupts the user’s experience and requires a tap to dismiss, so it’s important for users to feel that the alert’s message warrants the intrusion.

Then later in the document when it describes alerts, it puts it in more detail.

An alert gives people important information that affects their use of an app or device… It’s best to minimize the number of alerts that your app creates, and make sure that each one offers criticial information and useful choices (emphasis mine). They then go on to detail the times to use dialogs or to use some other mechanism. For the most part, the use of these dialogs is discouraged as they are very intrusive.

Coders love absurdity

Taken from a Medium post:
Programmers have to worry about things no sane human being ever considers. You drop a “last name” field in your design without thinking about it, but to a coder, there’s a hundred anxieties associated with that:

  • What if the person doesn’t have a last name?
  • What if their last name is expressed as a mathematical equation?
  • What if their last name is longer than 255 characters?
  • What if their last name contains tab characters, multiple paragraphs, non-breaking spaces, emojis, parentheses, commas, single and double quotes?
  • What if their last name changes between the time when they type it in and the time when they submit the form?

To any normal human being, these questions are absurd. To a coder, they are common sense. What this means for you, as a designer, is that you must keep close to your coders, try as much as possible to anticipate the anxieties that will besiege them, and keep them from derailing the experience with utter lunacy.