eurica

numbers for people.

Making better words with weird Javascript

Note: this is just a post about how I iterated on my tools to make words with JavaScript type coercion, that’s probably the page that you’re looking for.

Within minutes my first feedback on my script that turns words into JavaScript brackets was “fuef..ie0u..daue”. Clearly there were some suboptimal choices in my letter replacements. I also had a bug concatenating numbers, all fixed.

I originally used the words file on OS X as an approximation of how useful every letter was, but clearly some words are more useful than others, even if I can’t say “Koko the woozy mommy goes zoom”.

So I pulled a word frequency list of the top 10000 words from Wikitionary and turned that into a csv: 10000words.csv. Those words represent 80%1 of words used (by frequency) in Gutenberg Project books, not a perfect corpus but lightyears better than just a list of words for spellchecking.

First off, lets see what I can spell with just the pure letters in “true”, “false” and “undefined”:
grep -i '^[falseundefinedtrue]*,' 10000words.csv > words_i_can_make.csv
Cool, those 941 words (9.4%) make up 19.1 of the language2, so we’re doing ok. Let’s put in 0=0
grep -i '^[falseundefinedtrueo]*,' 10000words.csv > words_i_can_make_with_o.csv

1365 words, but 33.5% of the language. A third! And it’s 6 of the 7 most used ones… except sadly #1 “the”

1465 words and 44.5 of actual usage. A little dissappointing, but it’s more than enough to communicate with

  • If we just admit that Z is a 2, we’re up to 1483/44.6%
  • If we can change y’s on the end of words to i’s we’re up to 1702/46.4%
  • Let’s start reaching now B can be either 8 or 3, and G can be either 6 or g can be 9. So let’s say we can make 2816/53.9%. More than half, way to go team!

If we look at the 46% of words that I still can’t make, we see that w and h would probably be the most helpful. I’m not sure how I feel about uu as w, but if we’re going to do that, we might as well do u = v….

This is not going the direction that I hoped. But one thing I noticed was that seeing the results in real-time helped me plan my messages, so I’m going to leave everything unchanged, and just add real-time updates :)

  1. The values beside each word are the number of times it appears per billion words
  2. Or at least of the 80% that I’m looking at, hopefully it’s all proportional

Comments are closed.