Wednesday, March 25, 2009

Cracking passwords with Wikipedia, Wiktionary, Wikibooks etc

One effective way of assessing password strength is to try and
crack them, and as most of you probably know, dictionary attack is the simplest yet formidable technique for cracking passwords.

Now, the problem is: your dictionary has to be as exhaustive as possible. Relying solely on common dictionaries (such as The Collins, Le Larousse, the ones contained in spell checkers, etc) just won't do because these are very limited, whereas basic human nature has us looking around when prompted to choose a password; a lot of people will then choose "belinea" because it's the brand of the monitor sitting in front of their eyes, "abnamro" because it's the bank outside their window, and so on.

However, it is very likely that any word you can put your eyes on is already in Wikipedia: try it, it is amazing.

A couple of years ago I generated a quick & dirty wordlist from Wikipedia in a dozen of languages. It helped quickly crack countless passwords, a lot of which bruteforcing would never get to.

Recently I managed to spare some time in order to generate a new one, inventorying words from 2009 (my old Wikipedia wordlist doesn't even have "twitter", imagine that :-P ) and from a way more comprehensive list of sources:

aawiki aawikibooks aawiktionary abwiki abwiktionary advisorywiki afwiki afwikibooks afwikiquote afwiktionary akwiki akwikibooks akwiktionary alswiki alswikibooks alswikiquote alswiktionary amwiki amwikiquote amwiktionary angwiki angwikibooks angwikiquote angwikisource angwiktionary anwiki anwiktionary arcwiki arwiki arwikibooks arwikinews arwikiquote arwikisource arwiktionary arzwiki astwiki astwikibooks astwikiquote astwiktionary aswiki aswikibooks aswiktionary avwiki avwiktionary aywiki aywikibooks aywiktionary azwiki azwikibooks azwikiquote azwikisource azwiktionary barwiki bat-smgwiki bawiki bawikibooks bawiktionary bclwiki betawikiversity bewiki bewikibooks bewikiquote bewiktionary be-x-oldwiki bgwiki bgwikibooks bgwikinews bgwikiquote bgwikisource bgwiktionary bhwiki bhwiktionary biwiki biwikibooks biwiktionary bmwiki bmwikibooks bmwikiquote bmwiktionary bnwiki bnwikibooks bnwikisource bnwiktionary bowiki bowikibooks bowiktionary bpywiki brwiki brwikiquote brwiktionary bswiki bswikibooks bswikinews bswikiquote bswikisource bswiktionary bugwiki bxrwiki cawiki cawikibooks cawikinews cawikiquote cawikisource cawiktionary cbk-zamwiki cdowiki cebwiki cewiki chowiki chrwiki chrwiktionary chwiki chwikibooks chwikimedia chwiktionary chywiki commonswiki cowiki cowikibooks cowikiquote cowiktionary crhwiki crwiki crwikiquote crwiktionary csbwiki csbwiktionary cswiki cswikibooks cswikinews cswikiquote cswikisource cswikiversity cswiktionary cuwiki cvwiki cvwikibooks cywiki cywikibooks cywikiquote cywikisource cywiktionary dawiki dawikibooks dawikiquote dawikisource dawiktionary de-labswikimedia dewiki dewikibooks dewikinews dewikiquote dewikisource dewikiversity dewiktionary diqwiki dsbwiki dvwiki dvwiktionary dzwiki dzwiktionary eewiki elwiki elwikibooks elwikiquote elwikisource elwikiversity elwiktionary emlwiki en-labswikimedia enwiki enwikibooks enwikinews enwikiquote enwikisource enwikiversity enwiktionary eowiki eowikibooks eowikiquote eowiktionary eswiki eswikibooks eswikinews eswikiquote eswikisource eswikiversity eswiktionary etwiki etwikibooks etwikiquote etwikisource etwiktionary euwiki euwikibooks euwikiquote euwiktionary extwiki fawiki fawikibooks fawikiquote fawikisource fawiktionary ffwiki fiu-vrowiki fiwiki fiwikibooks fiwikinews fiwikiquote fiwikisource fiwiktionary fjwiki fjwiktionary foundationwiki fowiki fowikisource fowiktionary frpwiki frwiki frwikibooks frwikinews frwikiquote frwikisource frwikiversity frwiktionary furwiki fywiki fywikibooks fywiktionary ganwiki gawiki gawikibooks gawikiquote gawiktionary gdwiki gdwiktionary glkwiki glwiki glwikibooks glwikiquote glwikisource glwiktionary gnwiki gnwikibooks gnwiktionary gotwiki gotwikibooks guwiki guwikibooks guwikiquote guwiktionary gvwiki gvwiktionary hakwiki hawiki hawiktionary hawwiki hewiki hewikibooks hewikinews hewikiquote hewikisource hewiktionary hifwiki hiwiki hiwikibooks hiwikiquote hiwiktionary howiki hrwiki hrwikibooks hrwikiquote hrwikisource hrwiktionary hsbwiki hsbwiktionary htwiki htwikisource huwiki huwikibooks huwikinews huwikiquote huwikisource huwiktionary hywiki hywikibooks hywikiquote hywikisource hywiktionary hzwiki iawiki iawikibooks iawiktionary idwiki idwikibooks idwikiquote idwikisource idwiktionary iewiki iewikibooks iewiktionary igwiki iiwiki ikwiki ikwiktionary ilowiki incubatorwiki iowiki iowiktionary iswiki iswikibooks iswikiquote iswikisource iswiktionary itwiki itwikibooks itwikinews itwikiquote itwikisource itwikiversity itwiktionary iuwiki iuwiktionary jawiki jawikibooks jawikinews jawikiquote jawikisource jawikiversity jawiktionary jbowiki jbowiktionary jvwiki jvwiktionary kaawiki kabwiki kawiki kawikibooks kawikiquote kawiktionary kgwiki kiwiki kjwiki kkwiki kkwikibooks kkwikiquote kkwiktionary klwiki klwiktionary kmwiki kmwikibooks kmwiktionary knwiki knwikibooks knwikiquote knwikisource knwiktionary kowiki kowikibooks kowikiquote kowikisource kowiktionary krwiki krwikiquote kshwiki kswiki kswikibooks kswikiquote kswiktionary kuwiki kuwikibooks kuwikiquote kuwiktionary kvwiki kwwiki kwwikiquote kwwiktionary kywiki kywikibooks kywikiquote kywiktionary ladwiki lawiki lawikibooks lawikiquote lawikisource lawiktionary lbewiki lbwiki lbwikibooks lbwikiquote lbwiktionary lgwiki lijwiki liwiki liwikiquote liwikisource liwiktionary lmowiki lnwiki lnwikibooks lnwiktionary lowiki lowiktionary ltwiki ltwikibooks ltwikiquote ltwikisource ltwiktionary lvwiki lvwikibooks lvwiktionary map-bmswiki mdfwiki mediawikiwiki metawiki mgwiki mgwikibooks mgwiktionary mhwiki mhwiktionary miwiki miwikibooks miwiktionary mkwiki mkwikibooks mkwikisource mkwiktionary mlwiki mlwikibooks mlwikiquote mlwikisource mlwiktionary mnwiki mnwikibooks mnwiktionary mowiki mowiktionary mrwiki mrwikibooks mrwikiquote mrwiktionary mswiki mswikibooks mswiktionary mtwiki mtwiktionary muswiki myvwiki mywiki mywikibooks mywiktionary mznwiki nahwiki nahwikibooks nahwiktionary napwiki nawiki nawikibooks nawikiquote nawiktionary nds-nlwiki ndswiki ndswikibooks ndswikiquote ndswiktionary newiki newikibooks newiktionary newwiki ngwiki nlwiki nlwikibooks nlwikimedia nlwikinews nlwikiquote nlwikisource nlwiktionary nnwiki nnwikiquote nnwiktionary nostalgiawiki novwiki nowiki nowikibooks nowikimedia nowikinews nowikiquote nowikisource nowiktionary nrmwiki nvwiki nywiki nzwikimedia ocwiki ocwikibooks ocwiktionary omwiki omwiktionary orwiki orwiktionary oswiki pagwiki pamwiki papwiki pa-uswikimedia pawiki pawikibooks pawiktionary pdcwiki pihwiki piwiki piwiktionary plwiki plwikibooks plwikimedia plwikinews plwikiquote plwikisource plwiktionary pmswiki pntwiki pswiki pswikibooks pswiktionary ptwiki ptwikibooks ptwikinews ptwikiquote ptwikisource ptwikiversity ptwiktionary qualitywiki quwiki quwikibooks quwikiquote quwiktionary rmwiki rmwikibooks rmwiktionary rmywiki rnwiki rnwiktionary roa-rupwiki roa-rupwiktionary roa-tarawiki rowiki rowikibooks rowikinews rowikiquote rowikisource rowiktionary rswikimedia ruwiki ruwikibooks ruwikinews ruwikiquote ruwikisource ruwiktionary rwwiki rwwiktionary sahwiki sawiki sawikibooks sawiktionary scnwiki scnwiktionary scowiki scwiki scwiktionary sdwiki sdwikinews sdwiktionary sewiki sewikibooks sewikimedia sgwiki sgwiktionary shwiki shwiktionary simplewiki simplewikibooks simplewikiquote simplewiktionary siwiki siwikibooks siwiktionary skwiki skwikibooks skwikiquote skwikisource skwiktionary slwiki slwikibooks slwikiquote slwikisource slwiktionary smwiki smwiktionary snwiki snwiktionary sourceswiki sowiki sowiktionary specieswiki sqwiki sqwikibooks sqwikiquote sqwiktionary srnwiki srwiki srwikibooks srwikinews srwikiquote srwikisource srwiktionary sswiki sswiktionary stqwiki stwiki stwiktionary suwiki suwikibooks suwikiquote suwiktionary svwiki svwikibooks svwikinews svwikiquote svwikisource svwiktionary swwiki swwikibooks swwiktionary szlwiki tawiki tawikibooks tawikinews tawikiquote tawikisource tawiktionary testwiki tetwiki tewiki tewikibooks tewikiquote tewikisource tewiktionary tgwiki tgwikibooks tgwiktionary thwiki thwikibooks thwikinews thwikiquote thwikisource thwiktionary tiwiki tiwiktionary tkwiki tkwikibooks tkwikiquote tkwiktionary tlhwiki tlhwiktionary tlwiki tlwikibooks tlwiktionary tnwiki tnwiktionary tokiponawiki tokiponawikibooks tokiponawikiquote tokiponawiktionary towiki towiktionary tpiwiki tpiwiktionary trwiki trwikibooks trwikiquote trwikisource trwiktionary tswiki tswiktionary ttwiki ttwikibooks ttwikiquote ttwiktionary tumwiki twwiki twwiktionary tywiki udmwiki ugwiki ugwikibooks ugwikiquote ugwiktionary ukwiki ukwikibooks ukwikimedia ukwikinews ukwikiquote ukwikisource ukwiktionary urwiki urwikibooks urwikiquote urwiktionary uzwiki uzwikibooks uzwikiquote uzwiktionary vecwiki vewiki viwiki viwikibooks viwikiquote viwikisource viwiktionary vlswiki vowiki vowikibooks vowikiquote vowiktionary warwiki wawiki wawikibooks wawiktionary wikimania2005wiki wikimania2006wiki wikimania2007wiki wikimania2008wiki wikimania2009wiki wowiki wowikiquote wowiktionary wuuwiki xalwiki xhwiki xhwikibooks xhwiktionary yiwiki yiwikisource yiwiktionary yowiki yowikibooks yowiktionary zawiki zawikibooks zawikiquote zawiktionary zeawiki zh-classicalwiki zh-min-nanwiki zh-min-nanwikibooks zh-min-nanwikiquote zh-min-nanwikisource zh-min-nanwiktionary zhwiki zhwikibooks zhwikinews zhwikiquote zhwikisource zhwiktionary zh-yuewiki zuwiki zuwikibooks zuwiktionary

All this represents tens of gigabytes of XML data that I processed with a little C program, but I'm not releasing the source code for this one as I don't want to be responsible for a bandwidth hit on the Wikimedia Foundation; I'm already more than grateful to them for helping me on a daily basis...

For the record: I didn't alter the case before saving this wordlist. If you want to force lowercase on all the words, be advised that:
  • sure JohnTheRipper's derivation algorithms will uppercase letters here and there, but it might miss passwords like "hawKeye" or "amaroK"
  • forcing case on UTF-8 text is tricky
Currently, the wordlist can be downloaded from a temporary storage provided by my ISP: wikipedia-wordlist-sraveau-20090325.txt.bz2 (MD5=e28104f22192b84854d259d9e93b5042, just for integrity). Feel free to leave a comment if you need a re-upload, or better yet if you can provide hosting ;-)


The wordlist is now mirrored at several places:
Many thanks to Tyop , sbz, s0kket and Sorcier_FXK!

Also, for those wondering: there are 58427178 words in the wordlist, and it weighs 213MB (710MB uncompressed).


  1. Good Idea but I wrote a little application to add CUDA support with some "cloudy" trick in aircrack-ng for a very very very fast password recovery using a JTR out.
    Example: recovering a password from a WPA handshake like "AR-A13E2Y" tooked only 48-hours. I'm not publishing the software for free, this time i need money for a new notebook.

    sincerly yours,

  2. how long did it take to input all the data?

  3. Ersan: interesting, but how would you say it compares to ElcomSoft Distributed Password Recovery (which you can run on 100 clients for the price of a notebook) or to Pyrit (which is free) ?

    Phillip: I'm sorry this time I can't tell you precisely... I ran my program occasionally for the past month (very little spare time + can't sleep with my computer on + living in an electrical hazardous flat I only leave the fridge on when I'm away) but I'm confident that the downloading and processing could all be done in under a week. Hope it somehow answers your question :-)

  4. Nice list and great work. Hope you don't mind a mirror at:

  5. @Seb Elcom is too expensive and Pyrit is for BSD/linux systems my version runs on win x86 and x64. the price is going to be 10 euro's just to help me out at the high coffee prices around here ;)

  6. Alright :) but don't wait too long to start selling it then... As a matter of fact, friends of mine are already working on CUDA support for John The Ripper, and I would be surprised if they're the only ones; same goes of course for Aircrack.

  7. i think i will release the free version with a limitation to max. 5 digits in a week.

  8. you made it on the frontpage of my magazine. gratulations :)

    and thx for your work. great idea.

  9. Great wordlist! Thanks for sharing!


  10. "I don't want to be responsible for a bandwidth hit on the Wikimedia Foundation"

    That's very kind of you, but the founder has no problem buying $800 bottles of wine with donation money.

    I suggest you look into who and how that place is run. It's disgusting.

    Thanks for the list.

  11. Hey I downloaded the textfile and opend in vim. file said its UTF-8. :sec enc said the same but I've very much of these typical quadrangles in the file.
    But i can read some arabic(?) words.
    whats wrong?

  12. About the wine bottles: :)

    About the file encoding: it is UTF-8 indeed, make sure your whole environment (editor, terminal, fonts) is capable of UTF-8... Also, a piece of advice I wanted to give you: I like Vim too, but don't open huge files with it, especially just for viewing them; use the "less" command instead.

  13. "It helped quickly crack countless passwords, a lot of which bruteforcing would never get to."

    Never get to? AFAIK Brute Forcing can be used to solve any possible password. It is only matter of time.. and "never" is a quite a long time..

  14. My sentence implied "would never get to before we die (or die of boredom)", thought it was obvious.

  15. Great work, Seb! I thought of a wordlist generated from Wikipedia, too but I do not have the skills.

    That's why I have to ask you if there is any possiblitiy to get the list sorted by languages.

  16. Hi! Thanks for you appreciation :)

    I did generate a wordlist for every source before merging them all into one single wordlist. I still have these files around, I just didn't put them online by lack of hosting...

    I can send them to you by email if you tell me your address (as I moderate the comments I can hide your address if you prefer) and a list of the sources you would like, such as "enwiki" for Wikipedia in English, etc.

    1. could I have just: svwiki svwikibooks svwikinews svwikiquote svwikisource svwiktionary?

  17. Hey, how about doing the same for myspace usernames/Names :)
    I've seen this a lot in passwords I crack -- someone's password is someone else's username (guess we're humans after all and we think in the same way and a cool password can be a cool username for someone else).

  18. Interesting idea indeed :)

    I think Facebook doesn't have an API accessible from the outside... but Twitter has one, which by the way reminds me of the work of Arvind Narayanan and Vitaly Shmatikov called "De-anonymizing Social Networks" :

    Their idea was to use Twitter in order to identify somebody posing as anonymous, but performing the same cross-checks could also lead to breaking passwords.

    Like... why do so many girls have a boy's name as their password? Seriously girls, stop using your boyfriend's name as your password, it's so obvious!! :D

  19. Seb, I don't know any API for myspace also, but you can just use google to search myspace.

    You can do the same for other social networking sites; forums etc .

    If you can lookup your password with google, then it's a weak password.

  20. Seriously Seb, just shut up with your sexism. That was totally unnecessary.

  21. Hah, sorry Nicki, it wasn't intended to be sexist at all, only an example of how social networking can be used to find somebody's password.

    I could (and should) have mentioned the huge proportion of men whose password is their favorite football club.

    The girls' example seemed more relevant because based on relatives, not hobbies, and even if some social networking websites encourage you to publish information about your hobbies, all of them - by definition - contain relatives data.

    As for the XKCD link, I don't know if you intended it to be mean... If it was the case then I'm sorry, it actually made me like your comment :-) for two reasons:
    1) I am a huge XKCD fan
    2) My best memories are of a girl I accidentally met on Freenode's ##security

  22. Exactly what I was going to do. This list kicks ass, no question about it. Thx for sharing.

  23. hi buddy,
    if u still have separate wordlists before merging them into one big i would be happy if u could send it to my e-mail (u can hide my email from public pls). i have interest about slovakian wordlists (skwiki skwikibooks skwikiquote skwikisource skwiktionary). thank u very much

  24. Really great idea! Thanks for sharing!
    In a previous comment you said you have single lists for the various languages. Is this still the case? If so, would you mind to send me them somehow via e-mail? That would be greatly appreciatet! :)

    My e-mail address is: thebestisaac(at)gmail(dot)com
    Thanks again!

  25. How about an update? :)

  26. There is wikipedia-wordlist-sraveau-20121203.7z on

  27. i am terribly late to the party... with that said I love you

  28. There's an upcoming
    CFP: Passwords^13 (PasswordsCon), Bergen, Dec 2-3
    Hopefully there will be an updated Wikipedia wordlist and more nice wordlists.