Casinos love guys with systems.
Jeff Atwood and Ted Dzubia both hate Swoopo, so it’s roughly as bad as PHP. A quick overview: “auctions” start at $0.00 and each bid raises the price by pennies, the time remaining in the auction by 10 seconds and costs the bidder 75 cents to place.
If you can get the last bid in (and you only place a few), you can pick up a $1000 laptop for $30. I mostly ignored Swoopo until Joshua Stein tried to game it. He was thwarted by HTTP requests not being accurate to the sub-second (since Swoopo gives ties to the users who waste money on automatic bidding), and determined that bidding was indistinguishable from gambling.
But I’m not convinced it can’t be gamed, the key being that you want to game it with high probability rather than win any one auction.
Just as a first pass, I think you want to find auctions where:
- Several are closing at the same time - so there’s less competition
- At a particular time of the day - same reason
- Only auctions for $500+ items selling for more than 90% off, so any accidental purchases can be safely sold at a profit (I don’t want to bother reselling DVDs)
So I used a greasemonkey script to download the last 10 000 winners into a spreadsheet.
Quick facts:
- 9904 auctions were won by 4217 distinct users (7 by phone)
- The average savings (vs the suggested price) was %65, although in 35 users paid more than the suggested price
- 2853 auctions were open only to manual bidders, rather than the automatic bidbutler (the difference in savings %66 vs %66 isn’t significant).
- Wins are spaced fairly evenly throughout the 24 hour clock
- The average winner placed ~95 bids, thousands are not uncommon, one “winner” placed 2623 bids
- Roughly one in ten auction winners placed only 1 or 2 bids.
Clearly the last point hints that it’s possible to win by sniping at the last minute.
Roughly 1 in 8 auctions was for items valued at more than $500, and won for less than 20% of the suggested price. “Winners” used an average of 311 bids — that doesn’t look good.
Next step, crack out the R.
Source: Swoopo dataset 3
The Government of Ontario runs a fantastic service to monitor the state of traffic jams on the 401: COMPASS Freeway Traffic Management System. So the obvious question becomes, when should I drive home?
Step 1: Get some data
First I ran a cronjob on the server hosting ultrasaur.us, that basically recorded the state of the various stretches of road. It’s been running a few days now, and after 14000 readings, there seem to be the following states for a stretch of road (with counts):
- Express and collector moving slowly (423)
- Express and Collector moving well (7055)
- Express and collector very slow (85)
- Express moving slowly. Collector moving well (205)
- Express moving slowly. Collector N/A (49)
- Express moving slowly. Collector very slow (138)
- Express moving well. Collector N/A (1236)
- Express moving well. Collector moving slowly (435)
- Express moving well. Collector N/A (271)
- Express moving well. Collector very slow (48)
- Express N/A. Collector moving well. (1241)
- Express N/A. Collector moving slowly (129)
- Express N/A. Collector moving well (421)
- Express N/A. Collector very slow (43)
- Express very slow. Collector moving slowly (45)
- Express very slow. Collector moving well (14)
- Express very slow. Collector N/A (75)
- Moving slowly (122)
- Moving well (795)
- N/A (1198)
Notice that there are some near duplicates with double spaces after a period — I’ll convert multiple spaces into singles.
Next I needed to give all of these a value, based on my back of the envelop calculations well means 80+, slowly means 50-80 and very slow means 0 to 50.
Caveats and thoughts:
- the values can’t be exactly calculated, so I’m not going to try,
- one important thing that I want to do is map each status to a unique value so that I don’t lose any data. The key is that the values be in order
- you can see that I’m biased towards the expressway
So values represent the proportional time it takes to travel over a stretch of road (ie higher is worse):
- 100: Moving well
- 101: Express and Collector moving well
- 130: Express N/A. Collector moving well
- 150: Express moving well. Collector moving slowly
- 160: Express moving well. Collector N/A
- 170: Express moving slowly. Collector moving well
- 180: Express moving well. Collector very slow
- 200: Moving slowly
- 201: Express and collector moving slowly
- 210: Express N/A. Collector moving slowly
- 250: Express moving slowly. Collector N/A
- 380: Express moving slowly. Collector very slow
- 410: Express very slow. Collector moving well
- 460: Express very slow. Collector moving slowly
- 501: Express very slow. Collector N/A
- 500: Express and collector very slow
- 510: Express N/A. Collector very slow
- null: N/A (I’m willing to extrapolate a guess at the other N/A’s, but not here)
So this gives me the first chance to make a graph, just over my first 14000 points, here’s the average state of the 401 Westbound over the 24 hours in a day (over a Monday-Wednesday):

The worst time to drive is 4-5pm, but the three hours from 3pm to 6pm seem to be the worst. That’s not much of a surprise (although it’s an hour or so sooner than I expected rush hour to start), but that evening rush hour is so much worse than morning rush hour is a bit of a shock. That 1pm is such a slow time is curious too, I wonder if that bump will go away with more data.
(Data is available to anyone who contacts me, it’ll eventually be available for download)
I wanted to analyse some of the data from Rate my Professor, and there’s no easy “download as CSV button” so I was going to have to screen scrape.
Awk to the rescue
I use Gawk for Windows and Wget. The awk code to turn a HTML table into a comma seperated file is (1.awk):
BEGIN {s=""; FS="n"}
/<td/ { gsub(/<[^>]*>/, ""); s=(s ", " $1);}
/<tr|<TR/ { print s; s="" }
|
And then you execute it as: gawk -f 1.awk *.jsp > marks.csv
That’s it, well almost. As always border cases take up most of the code, I’ll post the longer version, and the R-code later.