Eurica!

numbers for the people.

May 13, 2012
by dave
0 comments

You’re probably polluting your statistics more than you think

In a recent post, Gabriel Rossman comes up with a simple example of why statistics are hard to do correctly with an easy example.

  • If good looks and smarts are distributed normally, and
  • If good looks and smarts have nothing to do with each other, and
  • If movie producers want both smarts and looks
  • Then, by observing employed actors we’ll assume that looks and smarts have a negative correlation
  • Even though we constructed this experiment with no correlation

Here’s a graph of 250 randomly generated points (with no correlation). With the red circles representing “actors who are smart and good looking enough to get a job (looks+smarts>2), and lighter blue x’s representing “people who wanted to be actors”:

Clearly if we only look at actors with jobs, we’ll see a clearly negative correlation between smarts and good looks. In fact, some brilliant actors are less attractive than an average person, and some gorgeous actors are dumber than an average person. Even more interesting though, is that if we try to rule out bias by looking at aspiring but unsuccessful actors as well, we’ll find that they exhibit a similar correlation. Here are the lines of best fit for both:

that both groups would exhibit a negative correlation is more obvious if you mentally split the groups on looks+smarts=0

This effect is particularly nefarious in that it’s distribution agnostic. For instance, assume for mathematicians:

  • Experience and brilliance are uniformly distributed
  • With experience, comes somewhat more brilliance (I’ve introduced a 20% correlation)
  • Only the top fifth of mathematicians (as measured by experience+brilliance) ever get anywhere, and the rest drop out to do something easier
  • It’s very easy to conclude that experience kills brilliance, and that a mathematicians best work will be done by 40

In a general sense (the proof being left as an exercise for the reader):

  • Given two measurements xi in X and yi in Y on a set of points p1…n in P, if the value of xi+yi increases the chance that pi will be sampled, it will introduce a phantom correlation between X and -Y

Kind of scary, eh?

Disclaimer: Although the author is ostensibly a mathematician, he has never been a very good one (he did the graphs in Excel, what’s that about?). All theorems should be proven from first principals before attempting to use at home. Vote no on the axiom of choice. source

November 26, 2011
by dave
1 Comment

Suggest a (rogue) Card Against Humanity

I heard about Cards Against Humanity last week, and I picked it up at Amazon, but apparently the expansion pack hasn’t been well reviewed (and isn’t in stock). So I’m making my own with the game crafter, which will be available at cost (probably about $10 for 100 cards) as soon as I get enough cards that I like. (I should mention that this game is in very poor taste, and possibly not safe for work)

I’d love your suggestions:
Continue Reading →

November 21, 2011
by dave
1 Comment

Tobin Tax Alternatives (99% vs 1%)

My favourite proposal floating around the #OccupyWallStreet/99% movement is a Tobin Tax, but they aren’t without drawbacks:

Tiny taxes on high-volume transactions raise a lot of money, but they also cost money to record, collect, and audit, which is why few jurisdictions have 0.25% sales taxes.

But the basic idea is sound: tax bad/useless things to avoid taxing good things1. That’s why books and cigarettes tend to have different tax rates, it’s also a very good argument for trading payroll taxes for pollution taxes (I’m shocked there isn’t more support for “jobs not smokestacks”).

But sticking with the 1% vs 99% framework, I can think of two alternatives:

Alternative 1: A Time Tax

Low value-add high frequency trading like ticker tape trading (or front running) rely on timely execution of trades — value investing does not. To discourage the former without affecting the latter, just randomize all trades by 1-2 seconds (the exact values would depend on the exchange).

This isn’t exactly revolutionary, it’s only recently that sub-second trading by computers in the same building as the exchange has been the standard.

Within the 1%/99% framework, the 99% buys stocks through their 401ks and mutual funds, with a timeline of days or sometimes months, it’s a very privileged class of traders who can jump in to save tiny fractions of the price in fractions of seconds.

“I never attempt to make money on the stock market. I buy on the assumption that they could close the market the next day and not reopen it for five years.” – Warren Buffett

Alternative 2: An information tax

A large amount of effort goes into preventing insider trading through insider trading windows and prosecutions — all to keep the appearance of a fair game for investors. But we don’t have to keep the game fair for financial firms, how much research does the 99% do before investing in your 401k? 2

The purpose of a stock market is to let businesses raise capital and the fact that it also tends to accurately those companies is a bit of a side effect — that some people can make money trying to predict those movements is entirely a side effect. Traders with insider information help price stocks more accurately and more quickly: but instead, the SEC expends a huge amount of the effort that the SEC expends3 to ensure that all investors get their information at the same time rather than whenever it reaches them organically mostly to preserve the idea that the stock market is a game of skill, not a game of chance.

It’s complicated, maybe impossible, to enforce insider trading laws, and the rule makers aren’t entirely on board with the idea in the first place. There’s a whole whack of compliance paperwork that comes with insiders legitimately trying to sell their shares too, not for malicious reasons, but rather because you generally don’t want to have your investments tied up in the company you work for 4 All compliance costs money and insider trading rules aren’t much help to people who invest in index funds — which should describe most of the 99%.

[Edit: While we're getting rid of SEC rules, Albert of continuations.com has a good point, let's also remove the quiet period for going public]

  1. Assuming that spending remains constant
  2. The right answer is: pick the fund with the lowest fees, make sure you max out your employer matching.
  3. and it really is a lot of effort proportionate to the harm
  4. As incentives, stocks and stock options are usually granted on a vesting schedule as a compromise between the employee’s desire to sell them quickly and the company’s desire for employees to hold on to them forever.

November 4, 2011
by dave
1 Comment

Use jQuery and the console to screen scrape a page

I’ll be at Cloud Expo next week for PagerDuty, and I wanted a list of companies there so I could see who were already customers. The closest thing I found was heavily HTML-ized, and I wanted plain text — in my naive youth I would’ve (and have) thrown the raw HTML through a tool like AWK, but the console in Chrome is a great tool, and since the page already has jQuery, I’ll just use that.

All the links to companies seem to have a target of “_blank” so hopefully that’s sufficient:


$("span a[target=_blank]")

Nope, it looks like there are other links as well. I could strip them out with a pile of regular expressions, but it seems that all the company names are bolded — not with a class but with inline styles. That’s a bit of a pain, since jQuery doesn’t let me use those in selectors, so we’ll need to filter the results:


$("span a[target=_blank]").filter(function() {
return $(this).css('font-weight') == 'bold';
}).each(function (index) { console.log(this.innerText) });

Bingo.

Continue Reading →

October 27, 2011
by dave
0 comments

Project: Randomly Failing Server

For a API example I’m working on for PagerDuty, I wanted something that would throw off errors regularly — not every single request but say one every 100 request.

FEPServer is a quick and dirty project on App Engine that’ll throw a random or a specific error every N requests. It’s written in Python (although written might be too strong a word, there’s almost no code), source is up at github.com/eurica/FEPServer

September 28, 2011
by dave
0 comments

Some software shout-outs

F.lux, which changes the tint of your screen to match the time of day is a great tool if for some reason you get woken up a lot at night and have to look at your screen for a couple minutes. Somehow (the science is left as an exercise for the reader) the default computer screen brightness makes me far too awake.

If you’re using Gmail, I’m also a fan of Rapportive which adds some more useful information about each contact to the side bar (it overwrites the ads, which you’ve probably adblocked anyway). Boomerang for Gmail adds a much needed “Send later” option as well as reminding you when you expected a reply but didn’t receive one.

September 24, 2011
by dave
0 comments

Honeypot emails and database leaks

As a web developer, one of your biggest fears should be your user database leaking for reasons ranging from hackers to someone left it on an FTP server and emailed a link through gmail.

There’s no way to guarantee no-one will ever get your user table and paste it on the internet, but there’s a pretty easy way to detect it in a lot of cases. Create a fake honeypot email address (mine is honeypot@euri.ca1) and place it somehwere in all your DBs and setting up a Google alert on it. This is a good idea2, and it’s easy too:

First set up a Google Alert to for “honeypot@euri.ca”, All results, As-it-happens to be emailed to me. Since the honeypot address will never be leaked you won’t be overwhelmed with emails.

Second, register one of your testing users with that email. This has the added bonus of a lightweight sanity check that you aren’t leaking user information. For instance, if you’ve accidentally left the users/index method in your scaffolding, this may pick it up. Along those lines you may want to set up an alert on one of your testing passwords (I don’t but you may).

Thirdly, this is purely an additive measure, and should be used in addition to actually doing everything possible to secure your private data.

  1. well, not anymore, now I have an even more super secret honeypot email
  2. it’s not mine, his post has been sitting in my drafts for a while but credit goes to reading about how Groupon’s India wing leaked their passwords and Joakal‘s suggestion.
    Andrew Cooke
    had a good point too, you may want to set up a Google Alerts on ‘”you@yourdomain.com” filetype:sql’ to see if anyone else has leaked your password through an SQL file.

July 14, 2011
by dave
3 Comments

Oh, you’re the NY Times guy (my #gwmm talk)

@seanyo asked me to give a talk about my experiences going viral with NYClean for Guelph Web Maker Meetup 1.4.

I’m told there are some blurry photos of the event [edit: thanks @shtistea!], but it was mostly a fun stream of consciousness recounting of the technical details of the hack and what I learnt about social media and my 15 minutes of fame. The group understood the tech part, and also tended to be the kind of people who play with things in a can-I-hack-this-to-do-this sort of way themselves.




Continue Reading →

June 4, 2011
by dave
4 Comments

Anyone *can* start a Groupon

In “Anyone can start a Groupon and other startup myths,” Andrew is arguing that the cost per customer acquisition is Groupon’s moat (keeping other companies out of their profit party) in that they can outspend any upstart — and they can. But this doesn’t mean more than “You can’t out Groupon them” — in fact unlike most consumer facing tech companies with billion+ evaluations, they don’t have any significant network effects so they don’t really have a moat at all.

How would you start a Groupon competitor? It’s not a tech problem (which is why, I suspect that HackerNews hates them), it’s strictly a chicken and egg problem: acquiring deals and acquiring eyeballs.

Get deals first: Aggregate. It’s a hard problem to get at least one local interesting deal a day. So let Groupon and LivingSocial and your local competitors provide the initial content. Send out one email or having a landing page with all today’s local deals sorted best to worst. Now you can build traffic cheaper than Groupon because you offer Groupon++. Even if Groupon et al get all the revenue from all the deals on your homepage you have your chicken and you can start searching for your own eggs (deals) to add to the top of the list.

Who’s doing it already: Dealmap, Yipit, 8Coupons, Yahoo

Get eyeballs first: Use dead end media. Newspapers are still struggling to replace classified ads, they have huge fixed investments in advertising space, customer service and marketing. Take the existing readers and coupon clippers and offer them exclusive deals. Heck, use one of the white-labelled deal sites — it has never been a technology problem.

Who’s doing it already: Wag Jag, Sun Media just bought StealTheDeal

Start with a few chickens and a few eggs: Specialize. AppSumo is Groupon for SaaS, HumbleBundle is Groupon for games, deals.feld.com is Groupon for people who like Brad Feld.

To paraphrase Andrew a little “So why don’t you build one, tough guy?”[1] Automated aggregation doesn’t have any kind of moat (and isn’t really all that interesting) and working with newspapers is an uphill battle. If the New York Times called me up tomorrow and said “we’re betting the company on an elite, invitation only Groupon in posh metropolitan areas” I’m in, but trying to help the Boise Tribune save its bloated marketing department AND try out-execute Groupon is not an easy play.

I don’t think Groupon is unstoppable, what makes the content compelling is the discounts, and what makes the discounts compelling are the audience and both of those ingredients are commodities. There’s no reason not to subscribe to multiple coupon sites, and competition should erode margins. That’s how commodities work in a free market.

Groupon does have 3 advantages right now. They’re defining the game, they could be instantly and massively profitable tomorrow by stopping their ad-buys, concentrating only on their most profitable markets and gliding on the market share they have now. Instead, they’re choosing what the customer acquisition cost is for their competitors, and what the marginal deal available looks like.

They are also executing better than anyone else, that can’t be ignored. The most straightforward way to win a race is to run fastest. If Andrew Mason can reliably out-execute his competitors, he doesn’t need a moat.

Those are formidable advantages, but what might be driving their ridiculous meteroric valuation is that they may be the first company to actually understand small businesses and get them to trust technology. Across North America (never mind the world) small businesses have crappy webpages, schedule their shifts on paper, use POS systems which are P’sOS, operate poor online advertising plans and ignore their analytics. Groupon could buy or build a system for any or all of those systems and instantly place it in front of hundreds of thousands of trusting customers. (Or they could be planning something cooler.)

In short, anyone can start a Groupon, and that’s going to slaughter their margins within the next few years, but a few years is a long time for an agile company to transfer traction from an easy problem “deep discounts are popular” to a meaningful & sustainable problem, either small business POS/IT/analytics or the mystery box and they have the runway to do it.

Update: There’s a lot of good discussion out there, I recommend this parachuting company’s successful experience to see how it can really work, and this TechCrunch discussion of why Groupon is as much about cashflow as sales. -June 15th

[1] Andrew and I spoke about this for a few hours today, and he’s a firehose of energy and ideas. I kind of wish I had taken notes.

May 26, 2011
by dave
2 Comments

Google Analytics adds Page Speed

Google Analytics is adding page speed calculations to their reports.

It’s still not on by default, you’ll have to add the following to your GA code:

_gaq.push(['_trackPageLoadTime']);

That’s cool, I’ll add it and play around with how it compares to calculating it myself. I could’ve sworn Clicky already had that, but I can’t find it so I must be confusing Clicky with Chartbeat

Web Statistics