Eats smaller websites for breakfast.

April 2, 2007

Awk is Awksome

Filed under: awk, data, programming — Dave @ 8:28 pm

I wanted to analyse some of the data from Rate my Professor, and there’s no easy “download as CSV button” so I was going to have to screen scrape.

Awk to the rescue

I use Gawk for Windows and Wget. The awk code to turn a HTML table into a comma seperated file is (1.awk):

BEGIN {s=""; FS="n"}
/<td/ { gsub(/<[^>]*>/, ""); s=(s ", " $1);}
/<tr|<TR/ { print s; s="" }

And then you execute it as: gawk -f 1.awk *.jsp > marks.csv

That’s it, well almost. As always border cases take up most of the code, I’ll post the longer version, and the R-code later.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word