Awk is Awksome
I wanted to analyse some of the data from Rate my Professor, and there’s no easy “download as CSV button” so I was going to have to screen scrape.
Awk to the rescue
I use Gawk for Windows and Wget. The awk code to turn a HTML table into a comma seperated file is (1.awk):
BEGIN {s=""; FS="n"}
/<td/ { gsub(/<[^>]*>/, ""); s=(s ", " $1);}
/<tr|<TR/ { print s; s="" }
And then you execute it as: gawk -f 1.awk *.jsp > marks.csv
That’s it, well almost. As always border cases take up most of the code, I’ll post the longer version, and the R-code later.





