I originally used the words file on OS X as an approximation of how useful every letter was, but clearly some words are more useful than others, even if I can’t say “Koko the woozy mommy goes zoom”.
So I pulled a word frequency list of the top 10000 words from Wikitionary and turned that into a csv: 10000words.csv. Those words represent 80%1 of words used (by frequency) in Gutenberg Project books, not a perfect corpus but lightyears better than just a list of words for spellchecking.
First off, lets see what I can spell with just the pure letters in “true”, “false” and “undefined”:
grep -i '^[falseundefinedtrue]*,' 10000words.csv > words_i_can_make.csv
Cool, those 941 words (9.4%) make up 19.1 of the language2, so we’re doing ok. Let’s put in 0=0
grep -i '^[falseundefinedtrueo]*,' 10000words.csv > words_i_can_make_with_o.csv
1365 words, but 33.5% of the language. A third! And it’s 6 of the 7 most used ones… except sadly #1 “the”
1465 words and 44.5 of actual usage. A little dissappointing, but it’s more than enough to communicate with
- If we just admit that Z is a 2, we’re up to 1483/44.6%
- If we can change y’s on the end of words to i’s we’re up to 1702/46.4%
- Let’s start reaching now B can be either 8 or 3, and G can be either 6 or g can be 9. So let’s say we can make 2816/53.9%. More than half, way to go team!
If we look at the 46% of words that I still can’t make, we see that w and h would probably be the most helpful. I’m not sure how I feel about uu as w, but if we’re going to do that, we might as well do u = v….
This is not going the direction that I hoped. But one thing I noticed was that seeing the results in real-time helped me plan my messages, so I’m going to leave everything unchanged, and just add real-time updates :)