Programming – Anya Shanahan's Shack

2013/08/182013/08/18

There is a reason that it gets called an interface.

I’ve been writing commercial software since 1996. Since then the Microsoft API for WIN32 has been extended to support the 64bit platform, but most of everything I learned while writing to that platform is just as applicable now as it was back then.

I’ve written apps for the JIRA SOAP API. They work well; as long as the soap interface doesn’t throw a wobbler. It throws a wobbler every effing day now. The SOAP API for Jira has been deprecated for the newer, shiner, REST API.

Fuck you.

Fuck you and your spurious deprecation of APIs because it’s not something that fits into your grand design.

This is not how you write an API.

Dave Winer had it right many years ago with the RSS API. It was really fucking simple, and I wrote a parser for it in ass-clown perl that worked well for everything I wanted it to do. Even when people started to put incorrect entities in their feeds I could deal with it. It still runs, and I’ve not changed it in over 7 years.

Oh no, they say, look at the major version they say, it keeps incrementing and they only have a limited obligation to support something once the major number increments.

Fuck you.

Version number increments are meaningless. When we have a Firefox 17 in January and a Firefox 23 in August, a Chrome 18 in January and a Chrome 28 in August I mean seriously, you’re making a fucking argument here? Every one of those goddamned browsers can still read HTML. They may not all *render* it the same way, but a <div> is a <div> is a fucking <div>.

One of the most stable APIs in computers is the concept of time(). Time is counted as the number of seconds since 12:00, Jan 1, 1970. Many have written alternative APIs for time, but all of them provide some call that allows you to translate that special structure into a number representing the number of seconds since that point in time. That is an API.

I’m full certain that somewhen in the 40th century, some piece of code will depend on a 64bit version of the time() call, and it will work.

You people have no fucking idea how to write an interface.

There is a reason why they call it an interface, it’s meant to be something you can program against for a reasonable amount of time.

Not like Facebook.

2012/03/092012/03/09

Is that a password in your pocket…

I’ve seen it again and again… a developer wants to access some restricted data over the internet in a client application, but is unwilling to use a per-user login to access the data. As a result, they embed a password into the application. Then they discover that the password is readable via some mechanism once the application is on the client. Developer scratches their head and tries to figure out how to secure the password. Developer gets frustrated as people say ‘that doesn’t work’.

Fundamentally, you are trying to hide a secret in a client application. There is a long history of trying to do this in applications. It forms the basis for pretty much all forms of application protection – and it is fundamentally impossible. If there is everything you need to run an application on a system, it just requires a certain amount of effort to determine the secret. The amount of effort varies, but in general it is a continual fight between the developer and the person trying to determine the secret.

Mechanisms have been developed to try and make the secret ‘something you have’. One of the earlier disk-based methods was to have ‘malformed’ sectors on a floppy drive that needed to be read. These sectors were only ‘malformed’ in that they were laid on the disk in a method that made them difficult to read normally. The sectors that were read became part of the code that was used to execute the application.

The fix to this form of protection was to read the protected content from an original and then putting this data into a new copy of the program, replacing the invalid content with this good data, and then skip/remove the code that performed the read of the drive data into that location.

An extension to this protection was to actually encrypt the code that is loaded from disk, and then decrypting it at execution time – the encryption varied from simple byte-level xor-based to more fancy xor with rotate. Typically this decryption code butted up to the decrypted code (sometimes even overlaying it), preventing you from setting a breakpoint at the first instruction following the decryption code). Solving this problem involved manually decrypting a few bytes (which at the time was a pen-and-paper operation), and then starting the decryption from the subsequent instructions. Sometimes easy, sometimes more difficult. This would typically be used in conjunction with the ‘special’ media to give a dual layer of protection.

Another mechanism was the hardware dongle. An oft-loved feature of expensive software, it typically embedded some data on the dongle that was necessary for the use of the application. Without the dongle, the application was useless. Some even went so far as to corrupt the data created from the application if the dongle was not present – e.g. straight lines would no longer be quite straight following a save-load cycle, making the files deteriorate following the transition (I think Autocad used this method).

The issue with hardware-based mechanisms is that they have a high cost associated with them on a per-unit basis. A quick search revealed a per-unit cost of €25 for low order quantities, which would need to be added into the cost of the application. In other words, this can quite often not be an appropriate for software which has a low price goal.

For any of these mechanisms, if someone obtained only one part of the solution (application without special disk/dongle) then a well written protection would mean that the application was unusable without the second part. Poorly written protections would use perform a simple test against the special item, and not actually make use of any of the underlying data from it. In general, once you have all the items that are needed for the running of the application all that mattered after that was skill and time.

Special media, encryption, dongles, packers, obfuscation, anti-debugging tricks are many of the tools that have been used to secure applications.

What has this got to do with the opening paragraph? Well quite a bit, actually. The developer needs to store some kind of secret in the application. This secret can be anything, but in general it is some form of key to gain access to some form of resource. Nowadays, the application is not going to be shipped with any physical media – after all, this is the 21st century, and the use of physical media is archaic. This tends to rule out special media and dongles from the mix.

This leaves encryption, packers, obfuscation and other anti-debugging tricks. There are some very good solutions out there for the packer/encryption/anti-debugging. A quick Google for ‘executable packer anti-debugging’ yielded some interesting results. It’s a fun and interesting area, where the developer is trying to outwit the villainous cracker. Some of the solutions are commercial – adding to the cost of application development, and reducing the ability to debug the application when deployed in the field. These generally are decisions that need to be made by the developer when deciding how to proceed to protect their application.

You have to do the math on it. If the cost of developing and implementing the protection exceeds the overall value that you have placed on the application then you simply cannot afford to spend time, effort and money on a solution that will cost you more than you will ever make.

The big take from this is that if you have a secret that you want to keep safe, don’t put it in the application – all that will accomplish is to keep it temporarily out of view. The truly wily hacker will be able to get it; and if the secret is a password to one of your important on-line accounts; then you should think of some other way to mediate access to the resource.

2010/01/12

Programmatically changing environment variables in Windows

It’s not difficult to set environment variable in Windows. System level variables are stored in HKLM/System/CurrentControlSet/Control/Session Manager/Environment. User level variables are stored in HKCU/Environment. They are either REG_SZ or REG_EXPAND_SZ variables. REG_EXPAND_SZ values use other environment variables to get their ultimate value, while REG_SZ values are considered ‘final destination’ variables.

The issue arises when you programmatically change the value and want it reflected in new programs that are launched. You make your changes in the registry, but none of the newly launches applications notice the change. You need to inform all the running applications that the settings have been changed. To do this you send a WM_SETTINGCHANGE message to all the running applications.

The logic is to issue a SendMessage(HWND_BROADCAST, WM_SETTINGCHANGE, 0, (LPARAM)"Environment"). As the meerkat in the advertisement says ‘Seemples’. Unfortunately, I have a couple of applications with badly written message processing loops which don’t defer to DefWndProc if they don’t handle the message, which causes this function to hang.

The more sensible logic is to use a SendMessageTimeout call, which has 2 extra parameters, one of which is a flag and the other is a timeout in milliseconds. The timeout is a maximum per window, which means that if there are 10 windows causing timeouts and you’re issuing it with a 1000 milli-second (1 second) timeout, then you will be stalled for 10 seconds. You have been warned. Most applications should respond in < 100 milli-seconds, and typically there are only a few badly behaved applications.

This brings us to the code. It’s short, and it’s C and it doesn’t do anything fancy at all. Compiled using MinGW as gcc -mwindows settings.c -o settings.exe

#include <windows.h>

int APIENTRY WinMain(HINSTANCE hInstance,
  HINSTANCE hPrevInstance,
  LPSTR lpCmdLine,
  int nCmdShow)
{
    DWORD output;
    SendMessageTimeout(HWND_BROADCAST, WM_SETTINGCHANGE, 0,
      (LPARAM)"Environment", SMTO_BLOCK, 100, &output);
    return (0);
}

Set a variable in the registry. Pop up a cmd window and issue a set command and the change is not reflected in the window. Close the window, run the settings program compiled above, then launch another cmd window and it will now reflect the change to the environment you made in the registry.

The message causes Explorer to re-read the environment, which is why newly launched programs see the changes. You are launching your applications from explorer (the start menu, icons on the desktop, the run menu) for the most part.

2009/08/11

Major.Minor.Micro – or we can only do so much

When you release a piece of software into the world, then you expect there to be problems. That’s where release number taxonomies come from.
Firstly, let’s define things. There is the Major number. This normally means ‘big things’ have changed. By this, we mean that something so fundamental in the system has changed that there is a good chance that stuff that worked previously won’t work now. It is also referred to as a ‘sea change’ – basically, so much stuff has changed that we can’t guarantee that things will work in the new version because too much stuff has changed. This approximates to the differences between IE7 and IE8 – they tried, but the combination of changes made it impossible to guarantee backwards compatibility.
Then there’s Minor number. This generally means that somethings have changed, but it is compatible with the prior version. You should not need to change things in order to work in the new system. You proably can’t go back, though.
Then there’s the micro version. This means it’s a code change that doesn’t affect the product in any way except to fix issues. This means no config changes, no stored data changes. You *should* be able to swap between micro versions without any issue.

2009/03/31

bash pip-isms or right hand side of pipe variables

Unlike my default shell (zsh), bash has a wonderful feature where it doesn’t keep variables that are set at the other end of a pipe, so for example:
i=
cat foo | while read bar; do
i=$bar
done
echo $i
Yields an empty line. I’ve been stung once or twice on this as I prototype the code initially in an interactive shell, which doesn’t exhibit the issue.
The simplest solution is to use a named pipe.

i=
mkfifo /tmp/foo$$
cat foo >/tmp/foo$$&
pid=$!
while read bar; do
i=$bar
done </tmp/foo$$

This gives the last line of the file in the i variable.

2009/01/21

Consistency checking a block device

I’ve been testing the resizing of the drives located on a Dell MD3000, and i’ve seen errors when resizing after the 2TB mark. This is on the new firmware which supports > 2TB logical drives. I wrote a script to write to random locations of a block device. It can then read them back and verify that they’re still the same as what was written. Rather than writing to the entire device I use random sampling, with a few fixed points on the block device. I pretty much get consistent failures. If I put in the failed locations into the next write run they come out again in the subsequent run. Kind of makes resizing a dangerous operation, even though it is stated that resizing is non-destructive.

I realize that the array is nothing more than a rebrand of another device, but it would be great if it was tested in a lab before something this bad got out to the customers.

#! /usr/bin/perl -w

use strict;
use Getopt::Long;
use Digest::MD5 qw(md5_hex);
use File::Basename;

my $fs;
my $readfile;
my $writefile;

my $numpatterns = 2048;
my $seed = undef;
my $size;
my $real_size;
my $help;

my %vars;
my @def_offsets = (0);

sub usage($) {
        print <<EOM;
Usage: $0 –fs=<filesystem> –read=<file>|–write=<file>
        [–num=<number of blocks>] [–offset=<offset to test>]
        [–seed=<random number seed>]
EOM
        exit ($_[0]);
}

my $result = GetOptions( ‘fs=s‘ => \$fs,
        ‘num=i‘ => \$numpatterns,
        ‘seed=i‘ => \$seed,
        ‘read=s‘ => \$readfile,
        ‘offset=i‘ => \@def_offsets,
        ‘write=s‘ => \$writefile,
        ‘h|help‘ => \$help);

usage(0) if defined($help);
warn "Need file system to use" if (!defined($fs));
warn "Need either a read or write file" if (!(defined($readfile) || defined($writefile)));

usage (1) if (!defined($fs) || !(defined($readfile) || defined($writefile)));
my $base = basename($fs);

open (IN, "</proc/partitions") || die "Could not load partition tables";
while (<IN>) {
        chomp();
        my ($major, $minor, $blocks, $name) = m/(\w*)\s+(\w*)\s+(\w*)\s+(\w*)$/;
        next if (!defined($major));
        if ($name eq $base) {
                $real_size = $blocks;
                last;
        }
}
close(IN);

die "Could not get size" if (!defined($real_size));

# Write to the offset in blocks
sub write_to_offset($$) {
        my ($offset, $buffer) = @_;
        sysseek(INFS, $offset * 1024, 0);
        my $write = syswrite(INFS, $buffer, 1024);
        if (!defined($write) || $write != 1024) {
                warn "Failed to write: $offset $!\n";
        } else {
                $vars{$offset} = md5_hex($buffer);
        }
}

sub read_from_offset($) {
        my ($offset) = @_;
        my $buffer;
        sysseek(INFS, $offset * 1024, 0);
        my $read = sysread(INFS, $buffer, 1024);
        if (!defined($read) || $read != 1024) {
                warn "Could not read 1024 bytes at $offset $!";
                return (1);
        }
        if (md5_hex($buffer) ne $vars{$offset}) {
                warn "Data at offset $offset was not the same as expected";
                return (1);
        }
        return (0);
}

sub get_buffer {
        my $i = 0;
        my $buffer = "";
        while ($i++ < 256) {
                my $randval = int(rand(255 * 255 * 255 * 255));
                $buffer .= chr($randval >> 24) . chr(($randval >> 16) & 255) .
                        chr(($randval >> 8) & 255) . chr($randval & 255);
        }
        (length($buffer) == 1024) || die "Buffer was " . length($buffer);
        return $buffer;
}

if (defined($readfile)) {
        # reading from previous file
        open (INPUT, "<$readfile") || die "Could not open previous run log";
        while(<INPUT>) {
                chomp();
                my ($key, $value) = m/(.*)=(.*)/;
                if ($key eq "patterncount") {
                        $numpatterns = $value;
                        next;
                }
                if ($key eq "size") {
                        $size = $value;
                        next;
                }
                if ($key eq "seed") {
                        $seed = $value;
                        next;
                }
                $vars{$key} = $value;
        }
        close(INPUT);
} else {
        $seed = time ^ $$ ^ unpack "%L*", `ls -l /proc/ | gzip -f` if (!defined($seed));
        $size = $real_size if (!defined($size));
        open (OUTPUT, ">$writefile") || die "Could not open new run log";
        print OUTPUT "patterncount=$numpatterns\n" .
                "size=$size\n" .
                "seed=$seed\n";
}

print "Size: $real_size [$size] Seed: $seed\n";
srand($seed);

my $mode = "<";
$mode = "+<" if ($writefile);
open(INFS, "$mode$fs") || die "Could not open raw device";

if ($writefile) {
        map { write_to_offset($_, get_buffer()) } @def_offsets;
        write_to_offset($size – 1, get_buffer());
        while($numpatterns > 0) {
                my $offset = int(rand($size));
                print "Writing pattern: $numpatterns           \r";
                next if defined($vars{$offset});
                write_to_offset($offset, get_buffer());
                $numpatterns–;
        }
        map { print OUTPUT "$_=" . $vars{$_} . "\n" } keys(%vars);
        close(OUTPUT);
} else {
        my $failcount = 0;
        my $tocount = scalar(keys(%vars));
        map { $failcount += read_from_offset($_); printf("To Count: %0.7d\r", $tocount–); } sort(keys(%vars));
        print "Count difference: $failcount\n";
}

consistency.pl.txt

2008/11/05

signal versus sigaction

the use of the

signal(int signum, void (*handler)(int))

is a smidgin dangerous on various operating systems. Under Solaris, for example once the signal has been delivered to the process the signal handler is reset, so a typical piece of code that wants to reuse the signal handler repeatedly will typically set the signal handler again when receiving the signal. This leads to a minor race condition where upon receipt of the signal and the re-setting of the handler the process receives another copy of the same signal. Some of these signals cause Bad things to happen – such as the stopping of the process (SIGTSTP for example). Under Linux it keeps the signal handler in place, so you have no fear of the event triggering an unwanted event.
The manual page for

signal

under Linux makes it clear that the call is deprecated in favour of the much more functional

sigaction(int sig, const struct sigaction *restrict act, struct sigaction *restrict oact)

call, which keeps signal handlers in place when you don’t pass the SA_RESETHAND parameter as part of the sa_flags parameter of the sigaction structure. So you get to explicitly choose to accept a signal once, and then have the system deal with it in the default manner afterwards.
Signals, are of course a real pain in the ass when dealing with sub-processes. For example the use of ptrace to perform profiling works well until you fork. If another SIGPROF signal arrives before you can create your signal handler then the child process is terminated as that’s the default behaviour in that situation.
Under Solaris (and Leopard) you can make use of dtrace to perform profiling on a set of processes without needing to deal with vagaries of signal handling, making this a non-issue. For those of you stuck in LD_PRELOAD land, probably the only thing that can be done is to set the signal disposition to be ignored before execing the new process. you have a small window where the profiling is missing, but the overall increased stability of the application is improved by preventing it from accidentally being terminated due to a profiling signal being received too soon. I know the accuracy nuts would hate that, but it’s part of the price of dealing with standards.

2008/09/24

Cheap and cheerful pwait for linux

#!/bin/bash -p
if [ $# -eq 0 ]; then
echo "Usage: $(basename $0) " 1>&2
exit 1
fi
while [ -d /proc/$1 ]; do sleep 0.5; done

If I implemented it using inotify, I presume I can get rid of the sleep, but that entails compiled code.

2008/02/22

While beauty may be skin deep

Ugly cuts to the bone…
Over at Coding Horror, young Jeff is complaining about the abuse of the words ‘Beautiful Code‘ wherein the best example of an essay that is in the book is from Yukihiro Matsumoto, who doesn’t even have the benefit of english as his native language, and he’s not really mentioning his own language in the essay.
It’s true, the beauty of the code isn’t in the language specifics, it’s in the expression that is being made.
Ugly code, however is ugly code, regardless of the language. It looks small, brutish and nasty.

2008/02/07

Where’s the SDK, huh?

I’m just wondering when Apple will be shipping the SDK for the ipod touch/iphone. Just being nosey really. I’ve veered away from jailbreaking it simply because I don’t want to end up with an expensive brick next firmware update.
They say February, but of course remembering that the Leopard launch being the end of the month it could be anywhere up to the 29th.
Another thing I’d like to see is the UI guidelines. I’m a bit of a nerd when it comes to reading design guidelines, simply because there are a lot of good points in them. Mind yo, you should not be slavishly obeying them, as, after all, they are only guidelines, and not commandments.
On guidelines, I’m get miffed with applications that require the use of the mouse to accomplish things. Vista’s keyboard usable everywhere is a charm to use, even while it’s gobbling up all those cpu and disk resources with the indexer.