Error free since 03:14:07 UTC January 19, 2038.

February 10, 2009

Working on Ultrasaur.

Filed under: programming — Dave @ 8:56 am

Both my readers are probably aware, all my mental energies are going into Ultrasaur Records Management.

November 8, 2008

Wow, Google Chart API

Filed under: misc, programming, thought of the day — Dave @ 4:11 pm

Note to self, Google Chart API is awesome.

October 31, 2008

Understanding Traffic on the 401, pt 1.

Filed under: misc, travel, data, programming, ideas, long rambling stories — Dave @ 8:45 am

The Government of Ontario runs a fantastic service to monitor the state of traffic jams on the 401: COMPASS Freeway Traffic Management System. So the obvious question becomes, when should I drive home?

Step 1: Get some data

First I ran a cronjob on the server hosting ultrasaur.us, that basically recorded the state of the various stretches of road. It’s been running a few days now, and after 14000 readings, there seem to be the following states for a stretch of road (with counts):

  • Express and collector moving slowly (423)
  • Express and Collector moving well (7055)
  • Express and collector very slow (85)
  • Express moving slowly. Collector moving well (205)
  • Express moving slowly. Collector N/A (49)
  • Express moving slowly. Collector very slow (138)
  • Express moving well.  Collector N/A (1236)
  • Express moving well. Collector moving slowly (435)
  • Express moving well. Collector N/A (271)
  • Express moving well. Collector very slow (48)
  • Express N/A.  Collector moving well. (1241)
  • Express N/A. Collector moving slowly (129)
  • Express N/A. Collector moving well (421)
  • Express N/A. Collector very slow (43)
  • Express very slow. Collector moving slowly (45)
  • Express very slow. Collector moving well (14)
  • Express very slow. Collector N/A (75)
  • Moving slowly (122)
  • Moving well (795)
  • N/A (1198)

Notice that there are some near duplicates with double spaces after a period — I’ll convert multiple spaces into singles.

Next I needed to give all of these a value, based on my back of the envelop calculations well means 80+, slowly means 50-80 and very slow means 0 to 50.

Caveats and thoughts:

  • the values can’t be exactly calculated, so I’m not going to try,
  • one important thing that I want to do is map each status to a unique value so that I don’t lose any data. The key is that the values be in order
  • you can see that I’m biased towards the expressway

So values represent the proportional time it takes to travel over a stretch of road (ie higher is worse):

  • 100: Moving well
  • 101: Express and Collector moving well
  • 130: Express N/A. Collector moving well
  • 150: Express moving well. Collector moving slowly
  • 160: Express moving well. Collector N/A
  • 170: Express moving slowly. Collector moving well
  • 180: Express moving well. Collector very slow
  • 200: Moving slowly
  • 201: Express and collector moving slowly
  • 210: Express N/A. Collector moving slowly
  • 250: Express moving slowly. Collector N/A
  • 380: Express moving slowly. Collector very slow
  • 410: Express very slow. Collector moving well
  • 460: Express very slow. Collector moving slowly
  • 501: Express very slow. Collector N/A
  • 500: Express and collector very slow
  • 510: Express N/A. Collector very slow
  • null: N/A (I’m willing to extrapolate a guess at the other N/A’s, but not here)

So this gives me the first chance to make a graph, just over my first 14000 points, here’s the average state of the 401 Westbound over the 24 hours in a day (over a Monday-Wednesday):

Westboud 401 travel times (higher is worse)

The worst time to drive is 4-5pm, but the three hours from 3pm to 6pm seem to be the worst. That’s not much of a surprise (although it’s an hour or so sooner than I expected rush hour to start), but that evening rush hour is so much worse than morning rush hour is a bit of a shock. That 1pm is such a slow time is curious too, I wonder if that bump will go away with more data.

(Data is available to anyone who contacts me, it’ll eventually be available for download)

July 11, 2008

56 Megs for 5 words

Being a Vista user, I’m used to getting smacked in the face, and yet even I was curious when Vista wanted to download a 56 megabyte update — is it installing something completely new?

Massive download

So naturally I checked out the Knowledge base article:

The words “Friendster,” “Klum,” “Nazr,” “Obama,” and “Racicot” are not recognized when you check the spelling in Windows Vista and in Windows Server 2008

Oh, noes! That’s serious indeed. And worse: it applies to such critical applications as Windows Mail! — The Office apps seem to “correctly” accept all of these words.

So to re-iterate, 5 words weighing in at 34 bytes compresses down to 56 megs. Giving a compression factor of ~10^jeeze-louise-people!

Seriously guys, I don’t want to hear any more about how XML adds a lot of overhead.

p.s. I’m pretty sure I know why this happened. The PM in charge of this had two choices: EITHER have a dev write code to do a diff, and get an SDET to test it which could take a long time, and get slagged in the .01% of cases where something goes wrong OR just throw resources at it, in this case the bandwidth of millions of people. It might have actually been the right choice from their perspective, but it betrays some poor design somewhere.

p.p.s. Standard disclaimer: I used to work for the Borg as a PM, but I know nothing about this particular team.

July 10, 2008

Using AWK to convert CSV to XML

Filed under: awk, programming — Dave @ 10:52 am

I needed to convert CSVs to XML, so it’s time to return to my text processing hero: AWK.

CSV to XML gets me 90% of the way there, except, it doesn’t actually convert from Comma Separated Values to XML, it uses space to separate values. Updated script:

BEGIN {RS = "\n"
FS = "," }
NR == 1 {for (i = 1; i <=NF; i++)
tag[i]=$i
print "<" node "XML>“}
NR != 1 {print ” <" node ">”
for (i = 1; i <= NF; i++)
print " <" tag[i] ">” $i “
print ” “}
END {print ““}

Notes on cleaning your data:

  • our column headings need to be sane, ie: no spaces or weird characters

  • Numbers with commas, ie “12,231″ won’t work

June 21, 2008

Scratch (goto considered impossible)

Filed under: toys, programming, kinda maybe funny — Dave @ 3:26 am

I spent an hour or two playing with Scratch a programming environment for kids. It’s infuriatingly difficult to re-use code (there are no functions for instance), no arrays, no network connectivity, and no external files.

But isn’t this awesome? (use the arrow keys to drive once it has focus, the green flag may start it)

Scratch Project

February 26, 2008

Lovely Bookmarklet for Arbitrary Code

Filed under: programming, javascript, security — Dave @ 2:19 am

I think there’s a script injection vulnerability with a site I use, and I’m itching to find a proof-of-concept attack. I haven’t had time, or the chutzpah to pick at it yet, but I did disable their client side verification. I don’t know why it just occurred to me today to make a “Run arbitrary code in the context of the page” bookmarklet.

To use it, drag this: Run Code onto your links toolbar. All the code is:
javascript:(function(){eval(prompt('Run this code on this page:'))})()

  1. Standard javascript: protocol and wrapping in a function to prevent being treated as a URL
  2. Ask for a string
  3. Eval the string

If there’s a client-side script CleanBadCharacters(s), you can run “CleanBadCharacters=function(s){return s}“, no quotes and you’ll need to do a little reading to find the exact validation.

Update: Not an hour after I added that hammer (metaphorically), everything’s looking like a nail. Eval-ing “document.body.style.color='white' ” is useful if someone thinks grey on gray is an acceptable colour scheme.

January 17, 2008

The inmates are running my source code

Filed under: programming — Dave @ 2:09 am

In The Inmates Are Running the Asylum, (It’s a software design book, seriously), Alan Cooper argues that programmers are so different from users that they probably shouldn’t design for them. That’s my job (I’m a program manager) but I think I might be just as bad.

One of my teammates told me that we needed to do something one way, which caused a problem down the pipe that I have to solve right now. But why would you take his word for it, when you can spend hours analyzing the source code. I didn’t have an IDE or a compiler, but I had full text search and a fantastic editor that can handle C# syntax highlighting. So I unrolled the code by stepping through the code from the entry to the bug, replacing every call to a non-obvious function with the meat of the function and throwing out overhead code (things like error-handling, casting, serializing, etc) and adding comments because unlike real programmers, commenting code doesn’t make me break out in hives.

And after taking 2 megs of source code down to a mere few hundred lines (with comments) and I had my answers, he was right, we probably needed to do it that way and more importantly I think I know how to work around it.

In summary: Nerds probably are a little different than most people.

December 16, 2007

Testing is hard

Filed under: programming — Dave @ 3:14 am

In a software company everyone’s a tester (or should be), which is essential, because no matter how many test cases the professional testers cover they really can’t get in the head of the user. That and testing any significant product is hard.

My main work machine is set to (British) English as my main language, but to display dates, times and currencies in French. Much of the software I use that supports French uses the wrong setting, but it took me two weeks to notice that my fingerprint scanner Reussi!-ed instead of Success!-ing, since as a Canadian, I’m used to just turning the cereal box around whenever there’s French on it. So eventually I’m going to have to switch to Chinese, that should be jarring enough for me to notice.

Lesson #1: There are more edge cases than are dreamt up in your philosophy

For the upcoming m.euri.ca (tools for living my life with my mobile phone), I was playing with the King County transit planner website and accidentally told the system I wouldn’t mind walking 20 miles between stops — putting every stop in walking distance and turning the task of finding the best route into a variation of the travelling salesman problem AND mildly DoS-ing the system:
King County DoS

Lesson #2: Idiot-proofing often underestimates the calibre of idiots out there

I have to be a little vague on this last point, since it involves my software in its vulnerable unfinished and thoroughly pre-beta state. But the basic idea is: no matter how well you’ve tested the components in isolation, actually run through every common scenario.

Lesson #3: If you have a quota of bugs to find, try to do something the salesman said would be easy :)

September 23, 2007

Startup thoughts I

Filed under: misc, programming, ideas — Dave @ 4:37 pm

For reasons I’ll never understand, people ask me for startup ideas. I work at a company that’s over 30 years old, forever in software company terms, so I have to assume it’s because I spend part of my day worrying about other people’s startups.

Making a Profit is Nice but…

The classic startup doesn’t need to be profitable really — not in the short term, at least. what you really want is profitability to scale better than linearly.

Consider coffee shops, if you build 2 coffee shops rather than one you’ll make twice as much money (roughly) and spend twice as much to get it, so your profits will be twice as much. The profit scales with the investment — yuck, you’ll never get rich that way. Even worse, you need to be profitable from the beginning to make any money too.

Linear Profit

With chains though, there’s a reason that there are 32 768 billion Starbucks. Things are cheaper in bulk, not only ingredients but advertising and design. You also get a little bonus from people being more comfortable with your brand. So in the case of chains you get something like this:

Better Linear

Certainly better. But if the first one’s not profitable, you probably won’t build 20 more.

Software makes people rich because the first copy costs a million dollars, but the next million cost a dollar and sell for the same price:

Software

So you don’t need to make a profit at the beginning to make a billion dollars in a few years.

linear4profit.png

But that’s so 1976, it’s the WEB 2.0 and since it’s not a point release we’re breaking backwards compatability with old business models.

Metcalfe’s Law states that the value of a network scales with the square of the number of participants. So let’s run these numbers; say you spend $1000 to develop something, and each user on the system gets one thousandth of a cent of value (on average) from every other user of the system. When we’re talking about tens of users, I have to add a glow to the Value line so you can tell it’s not zero.
networkprofit.png
With hundreds of users, it still isn’t making financial sense:
network100sprofit.png

But with thousands or tens of thousands of users we’re actually creating some value here:
network1000sprofit.png

It’s worth mentioning this is value we’re creating for the users, we haven’t made any money yet. It’s not hard to make something useful for other people, the hard part is getting them to give you more money for it than you spent making it. But when we get in to hundreds of thousands of users, there’s so much value being created that it becomes a lot easier to shave off enough of that value to create a profit:
network100000sprofit.png
Maybe we’ll sell ads, maybe our users will baffle me by spending $1 to send tiny pictures to each other. But most likely at this point we’d sell the company to another company that has a proven track record of making money and let them worry about it.

Next Page »