Ghost Got Secrets - Ghostbin's Guts Part 2

Hello again ladies and gentlemen! My apologies for the delay in bringing this next installment of the Ghostbin’s Guts series to you - these past few months have been hectic. From quitting my job to trying to start a company to finding contract work to keep myself afloat, I’ve been having a tough time properly managing my schedule.

That’s no excuse though! Should you enjoy what I’m writing here and want me to write more, please harass me to do so on Twitter to speed things along. I love hearing from folks that like my writing and am much more motivated to put these together when I know people are looking forward to reading them.

Anyhow, without further ado I give you Ghost Got Passwords – Ghostbin’s Guts Part 2!

Setting the Stage

Back in August I set about scraping data from the Ghostbin paste-sharing platform. I had always been curious about how honest the “anonymity is used as much for good as it is for bad” argument was, and tested the truth of this statement through reviewing the contents of Ghostbin. As it turned out, most of what Ghostbin was being used for fell into the “malicious” category.

A breakdown of 20,868 Ghostbin pastes that I analyzed resulted in the following distribution of paste contents:

Paste Distribution

Of the 20,868 files that were classified, 3,975 of them contained password dumps. The size distribution of these 3,975 password dumps are shown below (note that the Y-axis has a logarithmic scale):

Dump Sizes

As can be seen above, the vast majority of pastes were under 5 kB in size, with 3,152 pastes being under 1 kB.

After I had singled out all of the password dump files, I then had to parse each file for username and password data. This got a bit tricky, as the formats of the files tended to differ quite a bit. For instance, the following patterns were commonly found in many of these files:

username:password
username,password
username password
username, password
username, password hash, password
password:username
password,username
username
password

Because of the widely varied representation of password data in these files, it is safe to say that the following information does not represent the entirety of the password data that I was parsing, and that some of the data is erroneous. However, I was able to successfully parse out the following number of data points (each row representing unique counts of the data point):

Data Point Counts

What follows is a statistical breakdown of what these passwords contained, and the resulting password lists that I fashioned from this data.

Digging for Gold

After I had parsed through all of the password dumps I pulled down, I was immediately intrigued by the number of email address and password combinations that this data set contained. I have been on many penetration tests where the target organization was not American, yet I only had English-based password dictionaries to throw at them. Here, then, was an opportunity to see if different nationalities employed different passwords.

I took all of the email address and password combinations that I had found and reviewed how many combinations I had by TLD. The results of this are shown below:

Count by TLD

For those that are curious, I also took a look at what domains were most commonly represented in these password dumps:

Count by Domain

While the lion’s share of the email address I analyzed belonged to the COM TLD, RU (Russia), NET (Generic), UK (United Kingdom), FR (France), and BR (Brazil) all had large enough presences to provide interesting insights.

I separated out all of the email and password combinations that I had by TLD and ran these lists through the Pipal Password Analyser software.

Firstly, let’s take a look at what the data trends are when we don’t break passwords down by TLD. Password lengths (agnostic of TLD) are shown below:

Agnostic Password Lengths

The percentage of last digits found in passwords (agnostic of TLD) are shown below:

Last Digit Agnostic

One of the great features of Pipal is that it looks into the passwords it’s analyzing and identifies common patterns. There are two sets of data that Pipal spits out regarding password pattern categories and the presence of particular characters in passwords. The first of these two data sets (agnostic of TLD) is shown below:

Categories 1 Agnostic

The second of these two data sets (agnostic of TLD) is shown below:

Categories 2 Agnostic

And last but not least, the most common passwords (agnostic of TLD):

Top 10 Agnostic

After looking at all of these graphs, I couldn’t say that I was surprised. The most common digit these passwords ended with was 1, passwords were commonly 6-8 characters long, lowercase+digit passwords were the most common followed by only lowercase alphabetic passwords, and 123456 was the most common password overall. So now let’s see if these trends held across the different TLDs.

Password lengths by TLD are shown below:

Length by TLD

Password last digits by TLD are shown below:

Last Digit by TLD

The first password categories data across all TLDs is shown below:

Categories 1 by TLD

The second password categories data across all TLDs is shown below:

Categories 2 by TLD

And finally, the most common passwords by TLD:

Most Common by TLD

Awesome – sure enough it looked like password lists could benefit from being tailored to specific language sets.

Takeaways

So now that we’ve got all of these pretty graphs in front of us, what are some of the takeaways we might conclude from this data?

  • Clearly there were errors in how I handled parsing of these lists, as shown by the NULL password in the FR top ten most common list.
  • The same password list likely appeared multiple times on Ghostbin, as shown by the unexpected results of the UK top ten most common list.
  • Keyboard layouts directly affect keyboard walk passwords, as “azerty” took the place of “qwerty” on the FR top ten most common list.
  • Different languages and nationalities have decidedly different common password habits, as evidenced by the category 1 and category 2 graphs across all TLDs.
  • Regardless of TLD, appending “1” to the end of every password that you try is probably a good idea.
  • Targeting passwords between six and eight characters in length will give you the best bang for your buck, regardless of the nationality of your target.

Most importantly,

  • There is not a one-size-fits-all password list for attacking applications and organizations across country borders. While certain lists will always be better than others, honing your list to target a specific language will return dividends.

Some Lists for You

With all of this work done and all of these passwords analyzed, it wouldn’t be fair not to share these password lists with you all. Note that these lists DO NOT contain any usernames or email addresses corresponding to the affected accounts.

The password lists are broken up by TLD and can be found on my GitHub:

https://github.com/lavalamp-/password-lists

Closing Thoughts

I hope you all found this analysis of passwords scraped from Ghostbin enlightening. I would love to see more effort put into creating language-specific password lists, as this analysis clearly indicates the necessity.

If you like this article, please share it! If you want me to write more like them, harass me at @_lavalamp.