Distribution of Word Lengths in Various Languages

I wanted to see how long words were in various languages. I had an intuition that German, for example, would have a very long average word length and distribution of word lengths, due to their practice of mashing together multiple words. The visualization below shows several languages visualized based on word lengths. The bar chart is a histogram of word lengths, and to the right is displayed the average word length in each language. If you hover your mouse over the bars, it will show the percentage of words that are of that length.

Note that this visualization isn't normalized based on usage. For example the English word 'the' is used frequently, while the word 'lugubrious' is rarely used; however both words count the same in computing the histogram and average word lengths. A great idea for a follow-up would be to use language corpuses instead of word lists in order to build these histograms.

This visualization was created by @RaviSParikh using d3.js.