eurica

numbers for people.

April 30, 2013
by dave
Comments Off

Bash script to upload files for S3 static hosting

Here’s a bash script to recursively upload a directory to S3 with s3cp

It turns out that I probably won’t use this, it’s slower than the Ruby script that I’m using (I suspect because it makes a new connection every time). Most importantly if handles the headers and GZipping so that everything is served optimally.

Since it uploads everything, it probably makes more sense to upload all the files of the same extension recursively.
Continue Reading →

April 27, 2013
by dave
Comments Off

Find duplicate files with a Ruby script

Sometimes my backup regime gets out of hand, so I made a simple script to find likely duplicates across multiple drives. Basically, if gives me a list of files that are the same size and same extension that are larger than 100 megs. There was some false positives, but not many, so even thought it outputs a batch file, I didn’t run it unedited:

ruby find_dupes.rb > dupes.bat

My drives were hooked up to a windows machine, hence all the windows-isms.
Continue Reading →

April 8, 2013
by dave
Comments Off

Make a WordPress site static

After salivating about the load times of static websites, I tried a few plugins, but it wasn’t as easy as I’d hoped. That, and I also don’t feel super confident about WordPress’s security model.

But you know what I understand? Shell scripting, so I made a script to take WordPress hosted in one place and SFTP it over to a static host:

  1. Remove the old file
  2. WGet –mirror the static site
  3. Manually grab the sitemap.xml and redirect the URLs with Sed
  4. Fix up files that have been saved as some munged up version of “jquery.js?ver=1.8.3″ by removing the query string
  5. Some bonus Sed magic to remove comments to make the files smaller
  6. Download the RSS feeds
  7. rsync it over (I use Dreamhost which makes this pretty easy, but most hosting companies should support this)

There’s a bit more: Continue Reading →

April 7, 2013
by dave
0 comments

Distance Matrix API (failed data project)

I’m going to try to document more of my side projects, even when they fail. Originally, when I found the Distance Matrix API, I planned to cover San Francisco in a grid, and compare neighbourhoods in terms of the ratio of average bike times to driving times to figure out which were the most bike-friendly.

The API limits you to 100 datapoints per query (and a limit of 100 datapoints every 10 seconds) and 2500 a day, so an 8×8 grid seemed about all I could reasonably load. That’s 64 starts x 63 destinations x 2 methods = 8064 datapoints (trips aren’t necessarily symmetric). I wrote a simple script to download a distance matrix from each of the 64 points (and each method) every 11 seconds, and then waited an hour or two before trying again to store them in localStorage if it was over the API limit.

Here’s the results as a JSON: sf_distances.txt (one datapoint is missing)

And here’s a summary spreadsheet: googlemapstimes.xlsx

It didn’t turn out to be an interesting dataset, but here are some conclusions:

  • On average it takes 2.34 times as long to bike somewhere as drive, but these seem to be times without traffic…which is never. That varies from 1.9 times to 3.25 times as long.
  • On average, the bike route is about 4% shorter than the car route. That varies from 14% shorter to 8% longer.
  • On a bike, the fastest starting point is (37.739,-122.451) on twin peaks, and by car it’s (37.715,-122.451) on the highway.

That’s where I gave up, because an evenly distributed grid doesn’t make much sense, some points are in parks, others are on highways, and traffic times aren’t taken into account.

What I’d like to do:

  • Pick some representative points, one in every neighbourhood
  • Use the maps API to calculate the time to a few common destinations
  • Compare walking, biking, driving without traffic, driving with traffic and public transit times

March 11, 2013
by dave
0 comments

New York vs San Francisco jobs on AngelList

At first I was a little annoyed that AngelList makes the salary and equity fields mandatory when you fill out a job. But it looks like a lot of companies take those values semi-seriously, so it might be an interesting dataset to look at.

AngelList Jobs Fortunately, a screen scraped copy of their jobs postings fell off a truck the other day, so I took a look at it: angellist data.xslx

There are 4630 posted jobs, 80% are full time, but people are also looking for Cofounders (11.4%), Interns (5.0%) and Contractors (3.5%). I’ve manually scrubbed a few jobs out, somewhat arbitrarily (although I MOOV is really offering $10 billion for a Technical Co-Founder).

San Francisco is a quarter of the jobs, New York is next with 11%. In fifth place with 3.8%, London is the larget international destination. Toronto sneaks into 10th with 1.5% of the openings.

So the real question is: Who is better San Francisco or New York?

Count Avg Avg Equity
San Francisco 1107 $90,279 1.64%
New York City 494 $77,256 2.34%
TOTAL: 4343 $77,018 2.78%

It’s ambiguous, NY gives more equity on average, SF gives more money.

But to do an apples to apples comparison, I’ll strip out everything except full time positions, remove posts with no location specified and do a bit more clean up (such as remove positions with 0 salary and 0 equity).

Actually, while we’re at it, according to this word frequency counter here are the 5 most common words in job titles:

  • engineer,1121
  • developer,1114
  • designer,462
  • senior,460
  • software,401

Let’s also limit ourselves to just jobs with “engineer” or “developer” in the title, there’s still almost 2000 postings.

Location Count Percent
San Francisco 580 29.9%
New York City 227 11.7%

Here’s the average salary ( just (min+max)/2 ) and equity broken down by the most popular cities:

Group Count Avg Salary % Avg Equity %
San Francisco 580 $99,721 117% 1.01% 71%
New York City 227 $89,905 105% 1.70% 118%
Palo Alto 96 $96,333 113% 1.16% 81%
Los Angeles 78 $87,442 103% 1.78% 124%
Boston 57 $86,754 102% 1.42% 99%
Mountain View 54 $103,398 121% 0.56% 39%
London 52 $65,952 77% 1.05% 73%
Toronto 38 $70,000 82% 1.27% 89%
Chicago 36 $75,000 88% 1.50% 104%
Seattle 34 $94,000 110% 0.99% 69%
TOTAL: 1939 $85,308 1.44%

San Francisco has some competition from Mountain View, but the SF/Palo Alto/MV numbers are all roughly in the same ballpark, the London numbers may be in pounds and the Toronto numbers may be in Celsius, but nothing looks unreasonable.

The salaries are a little less than what you find posted elsewhere which makes sense (even if those ranges are lower than the word on the street). If you look at the raw data you’ll see the ranges are almost sensible too.

In this case, I’m giving the tie to the home team, but here’s the data:

March 11, 2013
by dave
0 comments

Word Frequency Counter (in JavaScript)

I need a quick word frequency counter, you’re welcome to use it, just paste your text in the first text area. Case, punctuation and words that only appear once are ignored.


Paste your words up there ^ and click down there v

March 5, 2013
by dave
0 comments

Forwarding Tweets from Twitter to Email

Background: I’m hardly a twitter power user, but for over a year, I’ve mostly been the person behind @pagerduty. We’re growing and I had to hand it off to the support team — but they’re real professionals who take things seriously so they wanted each tweet to trigger a ticket in our support system.

Options for sending tweets to email:

Twitter

There are a few 3rd party applications, they seemed icky to me, but Twitter does it themselves:

Unfortunately, it only sends some emails:

If you enable notifications for Retweets, @replies or mentions, favorites, or follows, you may not receive a notification for every Retweet, @reply or mention, favorite or follow. In an effort to send you email only when it’s most relevant, we may not, for example, notify you of mentions by accounts that are new or have not yet confirmed the email address associated with their account. We are constantly experimenting with email notifications to strike the right balance in keeping you up to date.

IFTTT

Twitter has been shutting down 3rd party integrations, but there’s still the RSS Feed so we can use that with:

If This then That is a cool service, and seems to be 100% free. Every 15 minutes it checks for the search term and emails me. I have a GMail filter set up to forward the emails into the support system.

Zapier

Zapier has paid tiers, but the free product is basically the same as IFTTT. This integration is a little different, I gave it credentials to my GMail account so it sends the email on my behalf.

The Zapier integration doesn’t use the RSS feed, so it’s a little better (I can print out the user info in the email).

TODO

I’m going to let all three run for a few days and then pick a winner.

February 27, 2013
by dave
0 comments

Meteor.js sample integration with PagerDuty

After our last internal hackday, where I took a little a flak for building my project in PHP, it was clear that I needed to finally get around to building something in node.js. I love learning new languages but I hate configuring new toolchains and deploying new stacks.

I have a major nerd crush on meteor.js
curl https://install.meteor.com | /bin/sh
meteor create leaderboard
cd leaderboard
meteor

And open up http://localhost:3000 and there’s something running. I don’t think I’ll ever write any PHP again. Want to test it on a live server?
meteor deploy my-leaderboard-example.meteor.com

HTML & CSS

The CSS is pretty straightforward, the only interesting parts are the CSS3 tomfoolery to round corners and pulsate the colours. The HTML is just a very simple handlebar template that displays all the Services. Here’s the key section:
<template name="all_services">
{{#each services}}
{{> service}}
{{/each}}
</template>

Client Code

There’s a single line of code shared by the server and the client defining the Services collection (pdstats.js). The rest of client/client.js:

  • Hooks the all_services template to a sorted list of Services
  • Starts calling the updateCurrentState function on the server every 30 seconds (and logs errors to the console)

Server Code

server/credentials.js is split out so that I don’t accidentally push my actual credentials to Github. Note: Meteor loads js in alphabetical order.

The meat of the program: client/client.js, is only ~30 lines:

  • Export the getCurrentState function to the client as updateCurrentState
  • Issue a Meteor.http.call to the PagerDuty Services API. This is necessary since Pagerduty doesn’t support cross-domain JavaScript calls yet (Feb 2013)
  • Parse the json
  • Iterate through each service
    • Calculate some simple statistics
    • Manually perform an upsert (update or insert accordingly). If this was more than a code sample, obviously we wouldn’t update services that haven’t changed.
  • The framework handles updating the client to reflect any changes to the Services collection.