The purpose of this research is to help advance the field of password cracking by developing password rule analysis techniques and deliver tools to help enhance rule-based password cracking attacks. The contributions of this research are techniques for source word reversal, the introduction of the Reverse Levenshtein Path algorithm for password rule reversal and a tool, PACK (Password Analaysis and Cracking Kit) that will produce valid rules and wordlists for use with the Hashcat password cracker. Through this work I have stumbled into an exciting new field of password analysis offering plenty of possibilities. I hope by reading this work you will also be inspired to start working and contributing to the art of password cracking.
This work is designed to be as practical as possible with many code snippets along the way. In fact, I highly encourage you to experiment with the sample code in order to get a better feel for the content and hopefully make improvements.
Table of Contents
Introduction
The field of password cracking has evolved by leaps and bounds over the last decade with the introduction of new cracking techniques, more advanced software and significantly faster hardware. These innovations were developed not only in response to improvements in cryptographic algorithms, but a gradual improvement of password quality by users as well. While the use of dictionary words for passwords is still a strong trend, many users have developed a set of word mangling rules to come up with passwords which are harder to crack and/or compliant with their organization's or website's password policy. For example, a rule to replace all letters 'a' with a number '4' could be used to mangle a source word 'password' to produce a password candidate 'p4ssword'. As a result, many passwords crackers such as Hashcat and John the Ripper have implemented rule engines that apply common word mangling rules to source wordlists in order to crack these more complex passwords.The analysis of password rules can be very effective in cracking large number of passwords that were generated using shared rules or words. For example, if a user's password (p4ssword) was compromised on site A, then an attacker can recover same user's password on site B if it was generated with the same rules but a different source word (e.g. dr4gon) or the same source word but different rules (password1). This work can also be used successfully for password cracking competitions such as Defcon's Crack Me If You Can where continuous rule and word recycling are essential to winning.
Below is a series of experiments and observations to address the challenges of source word reversal, password rule generation and practical application through the use of modern password cracking tools.
Reversing source words used in passwords
Let's take a step back for a moment and observe that word mangling is not always intentional. In fact, unintentional word mangling is simply called a spelling error. Let's consider for a moment that a dictionary-based mangled password is a special case of intentionally misspelling a word. Using this assumption, we could utilize a large body of spell-checking algorithms such as the classic Levenshtein Distance and a number of open-source spell-checking libraries such as Aspell, Hunspell and many others.Let's run a few experiments to determine their usability and limitations for restoring source words from mangled passwords.
Using spell-checking libraries
For this experiment, I have chosen Enchant - a spell-checking wrapper library with an ability to interface and dynamically choose many different spell-checking libraries (Aspell, Myspell, Hunspell, etc.). Let's observe how Enchant interprets the example above using a Python script with the PyEnchant library:#!/usr/bin/env python
import enchant
import sys
if len(sys.argv) < 1:
    print "Usage: rulegen.py p4ssw0rd"
    sys.exit(1)
d = enchant.Dict("en_US")
password = sys.argv[1]
print "%s: %s" % (password,", ".join(d.suggest(sys.argv[1])))
$ python rulegen1.py p4ssword1
p4ssword1: password, swordplay, crossword
$ ./rulegen1.py wordpass
wordpass: word pass, word-pass, passwords, password, wordless, swordplay, wordplay, waitperson
Things get complicated quickly as users come up with more complex rules. For example consider a small variation on the password 'wordpass' above:
$ ./rulegen1.py w0rdp4ss
w0rdp4ss: wardress, redeposit, bandpass
Pre-analysis engine
Let's adjust the previous application with a simple password pre-analysis engine designed to detect known numeric substitutions to prepare passwords to be processed by the spell-checker:#!/usr/bin/env python
import enchant
import sys
if len(sys.argv) < 1:
    print "Usage: rulegen.py p4ssw0rd"
    sys.exit(1)
leet = dict()
leet['4'] = 'a'
leet['0'] = 'o'
def preanalysis(password):
    preanalysis_password = ''
    for c in password:
        if c in leet: preanalysis_password += leet[c]
        else: preanalysis_password += c
    return preanalysis_password
d = enchant.Dict("en_US")
password = preanalysis(sys.argv[1])
print "%s: %s" % (password,", ".join(d.suggest(password)))
$ ./rulegen2.py w0rdp4ss
wordpass: word pass, word-pass, passwords, password, wordless, swordplay, wordplay, waitperson
There are several other tasks the preanalysis engine may take care of in order to take specifics of the password creation and present spell checkers with better candidates.
- Rotation Rules Just like we found a rotation of the word password to wordpass, candidates such as dpasswor, asswordp, etc. are sometimes used by users and are supported by many password crackers.
- Insertion, prefix and appendix rules Some great prefix and appendix candidates could be detected by observing non-alpha characters on each side of the an alpha word.
- Duplication rules Duplicating first/last character or substrings (e.g. pppassword, passworddd, papassword, passwordpassword, passworddrowssap, etc.)
- Combination rules Combining one or more words (e.g. super!man)
- Password patterns Frequently used password patterns such as email addresses can be detected and processed (e.g. password@aol.com)
Custom wordlists
I have intentionally avoided the use of the phrase source "dictionary word" and used "source word" instead. For the purposes of password rule analysis, we must often consider source words not normally considered by standard spell-checking dictionary-based wordlists.First let's modify the original source code to include the possibility of custom dictionaries:
#!/usr/bin/env python
# iphelix [at] thesprawl.org
import enchant
import sys
if len(sys.argv) < 1:
    print "Usage: rulegen.py p4ssw0rd [wordlist.txt]"
    sys.exit(1)
if len(sys.argv) == 3:
    d = enchant.request_pwl_dict(sys.argv[2])
else:
    d = enchant.Dict("en_US")
print d.suggest(sys.argv[1])
$ ./rulegen3.py iloveu123
['spillover', 'allover', 'alleviate']
$ ./rulegen3.py iloveu123 wordlist.txt 
['iloveu']
Reducing results
The potential for improvement is so large when trying to intelligently produce source words that even getting a couple (or a dozen) of incorrect candidates will still produce a significant improvement in password cracking in the long run and as a result every possible candidate must be considered and recorded.However, there may be cases where the results list is too large for a particularly slow algorithm. We must be able to determine best word candidates and ignore the rest. A common way to evaluate source word candidates is by calculating their Levenshtein edit distance. Let's modify the original sample program to accept a maximum edit distance parameter and filter results accordingly:
#!/usr/bin/env python
import enchant
import sys
if len(sys.argv) < 2:
    print "Usage: rulegen.py p4ssw0rd 5"
    sys.exit(1)
password = sys.argv[1]
max_edit_distance = sys.argv[2]
# Levenshtein distance algorithm
# Source: https://en.wikipedia.org/wiki/Levenshtein_distance
def levenshtein(s, t):
    if len(s) == 0:
        return len(t)
    elif len(t) == 0:
        return len(s)
    else:
        cost = 0
        if s[0] != t[0]:
            cost = 1
        return min( levenshtein(s[1:], t) + 1,\
                    levenshtein(s, t[1:]) + 1,\
                    levenshtein(s[1:], t[1:]) + cost)
d = enchant.Dict("en_US")
for word in d.suggest(password):
    edit_distance = levenshtein(word,password)
    if edit_distance <= max_edit_distance:
       print "[%s] %s" % (edit_distance,word)
$ ./rulegen4.py p4ssw0rd 5
[2] password
[5] Pissaro
[5] assured
$ ./rulegen4.py p4ssw0rd 2
[2] password
Future Work
As was illustrated in several examples above, the existing spell-checking libraries and algorithms can be taken only so far when trying to reverse source words. There is plenty of room for improvement of algorithms targeting the specifics of password generation as well as the initial wordlists used for the comparison.Reversing Word Mangling Rules
Now that we have one or more good source word candidates, let's tackle the second challenge of discovering word mangling rules used to transform source words into passwords.Levenshtein Edit Distance Algorithm
In order to absorb further research, let's get an intuitive feel for the way Levenshtein distance algorithm works by visually observing the matrix used to generate the edit distance.Below is a sample dynamic implementation of the Levenshtein distance algorithm which is great for learning since it preserves the matrix as it is calculated.
#!/usr/bin/env python
# iphelix [at] thesprawl.org
import sys
def matrixprint(matrix,source,target):
    print "    %s" % " ".join(list(source))
    for i,row in enumerate(matrix):
        if i == 0: print " ",
        else:      print target[i-1],
        print " ".join(str(col) for col in row)
def levenshtein(source, target):
    # 1) Generate the initial matrix
    matrix = []
    for i in xrange(len(target) + 1):
        matrix.append([])
        for j in xrange(len(source) + 1):
            if i == 0:
                matrix[i].append(j)
            elif j == 0:
                matrix[i].append(i)
            else:
                matrix[i].append(0)
    # 2) Populate the matrix using Levenshtein algorithm
    for i in xrange(1,len(target) + 1):
        for j in xrange(1,len(source) + 1):
            if target[i-1] == source[j-1]:
                matrix[i][j] = matrix[i-1][j-1]
            else:
                insertion    = matrix[i-1][j] + 1
                deletion     = matrix[i][j-1] + 1
                substitution = matrix[i-1][j-1] + 1
                matrix[i][j] = min(insertion, deletion, substitution)
    return matrix
if __name__ == "__main__":
        if len(sys.argv) < 3:
            print "Usage: lev.py word password"
            sys.exit(1)
        word = sys.argv[1]
        password = sys.argv[2]
        matrix = levenshtein(word,password)
        matrixprint(matrix,word,password)
$ python rulegen5.py password password    
    p a s s w o r d
  0 1 2 3 4 5 6 7 8 
p 1 0 1 2 3 4 5 6 7 
a 2 1 0 1 2 3 4 5 6 
s 3 2 1 0 1 2 3 4 5 
s 4 3 2 1 0 1 2 3 4 
w 5 4 3 2 1 0 1 2 3 
o 6 5 4 3 2 1 0 1 2 
r 7 6 5 4 3 2 1 0 1 
d 8 7 6 5 4 3 2 1 0*
Now let's look at an example with a couple of letter replacements:
$ python rulegen5.py password p4ssw0rd
    p a s s w o r d
  0 1 2 3 4 5 6 7 8
p 1 0 1 2 3 4 5 6 7
4 2 1 1 2 3 4 5 6 7
s 3 2 2 1 2 3 4 5 6
s 4 3 3 2 1 2 3 4 5
w 5 4 4 3 2 1 2 3 4
0 6 5 5 4 3 2 2 3 4
r 7 6 6 5 4 3 3 2 3
d 8 7 7 6 5 4 4 3 2*
    p a         p a s s w o
  0 1 2       0 1 2 3 4 5 6
p 1 0 1     p 1 0 1 2 3 4 5
4 2 1 1*    4 2 1 1 2 3 4 5
            s 3 2 2 1 2 3 4
            s 4 3 3 2 1 2 3
            w 5 4 4 3 2 1 2
            0 6 5 5 4 3 2 2*
insertion    = matrix[i-1][j] + 1
deletion     = matrix[i][j-1] + 1
substitution = matrix[i-1][j-1] + 1
matrix[i][j] = min(insertion, deletion, substitution)
matrix[i-1][j-1] + 1 was minimal and as a result we have recorded a diagonal increment of 1 corresponding to a substitution operation of 'a' to '4' and 'o' to '0'. Notice how the value propagates toward the bottom right corner accumulating all of the edit counts.Just to solidify the algorithm let's look at another password with an insertion:
$ python rulegen5.py password p4ssw0rd1
    p a s s w o r d
  0 1 2 3 4 5 6 7 8
p 1 0 1 2 3 4 5 6 7
4 2 1 1 2 3 4 5 6 7
s 3 2 2 1 2 3 4 5 6
s 4 3 3 2 1 2 3 4 5
w 5 4 4 3 2 1 2 3 4
0 6 5 5 4 3 2 2 3 4
r 7 6 6 5 4 3 3 2 3
d 8 7 7 6 5 4 4 3 2
1 9 8 8 7 6 5 5 4 3*
matrix[i-1][j] + 1 which corresponds to the insertion operation.After observing this pattern you will notice that the Levenshtein matrix actually preserves all of the transformations performed on the original string, it can be used to trace back the path from the mangled password to the source word while recording editing operations such as substitutions, deletions and insertions. Now it's getting exciting!
Reverse Levenshtein Path
Using the above observation we can traverse the dynamically generated Levenshtein matrix and attempt to find one or more minimal paths starting from the bottom right corner of the matrix to the top left. Tracing the matrix in reverse (as opposed to start to finish) allows us to traverse only those paths which can result in an optimal edit path. The matrix can be traversed in a recursive function as illustrated by the following pseudocode:**function** levenshtein reverse paths are:
**input:** matrix, i = matrix_height, j = matrix_width, edit ops counter = 0
**output:** [(edit op), (edit op), ...], [(edit op), (edit op), ...]
    1. if i is 0 and j is 0, return
    2. if edit ops > min edit distance, return
    3. otherwise,
        best path is minimal vertical, horizontal or diagonal cost
            in the reverse direction of the current location in
            the matrix.
        if vertical path is best path,
            edit ops increment 1
            return levenshtein reverse paths(vertical, edit ops) + (insertion)
        if horizontal path is best path, 
            edit ops increment 1
            return levenshtein reverse paths(horizontal, edit ops) + (deletion)
        if diagonal path is best path,
            if diagonal cost is current cost,
                return levenshtein reverse paths(diagonal, edit ops) + (equals)
            otherwise,
                edit ops increment 1
                return levenshtein reverse paths(diagonal, edit ops) + (replacement)
**end** levenshtein reverse paths
#!usr/bin/env pythonAs previously discussed, the above algorithm works by finding all minimal paths to transform 'word' into 'password' by walking the Levenshtein path in reverse while recording edit operations in reverse: diagonal path - replace or equal, horizontal - delete, vertical - insert.
import sys
############################################################################
# Calculate Levenshtein edit path matrix
def levenshtein(word,password):
matrix = []
# Generate and populate the initial matrix
for i in xrange(len(password) + 1):
matrix.append([])
for j in xrange(len(word) + 1):
if i == 0:
matrix[i].append(j)
elif j == 0:
matrix[i].append(i)
else:
matrix[i].append(0)
# Calculate edit distance for each substring
for i in xrange(1,len(password) + 1):
for j in xrange(1,len(word) + 1):
if password[i-1] == word[j-1]:
matrix[i][j] = matrix[i-1][j-1]
else:
insertion = matrix[i-1][j] + 1
deletion = matrix[i][j-1] + 1
substitution = matrix[i-1][j-1] + 1
matrix[i][j] = min(insertion, deletion, substitution)
return matrix
############################################################################
# Print word X password matrix
def levenshtein_print(matrix,word,password):
print " %s" % " ".join(list(word))
for i,row in enumerate(matrix):
if i == 0: print " ",
else: print password[i-1],
print " ".join("%2d" % col for col in row)
############################################################################
# Reverse Levenshtein Path Algorithm by Peter Kacherginsky
# Generates a list of edit operations necessary to transform a source word
# into a password. Edit operations are recorded in the form:
# (operation, password_offset, word_offset)
# Where an operation can be either insertion, deletion or replacement.
def levenshtein_reverse_path(matrix,word,password):
paths = levenshtein_reverse_recursive(matrix,len(matrix)-1,len(matrix[0])-1,0)
return [path for path in paths if len(path) <= matrix[-1][-1]]
# Calculate reverse Levenshtein paths (recursive, depth first, short-circuited)
def levenshtein_reverse_recursive(matrix,i,j,path_len):
if i == 0 and j == 0 or path_len > matrix[-1][-1]:
return [[]]
else:
paths = list()
cost = matrix[i][j]
# Calculate minimum cost of each operation
cost_delete = cost_insert = cost_equal_or_replace = sys.maxint
if i > 0: cost_insert = matrix[i-1][j]
if j > 0: cost_delete = matrix[i][j-1]
if i > 0 and j > 0: cost_equal_or_replace = matrix[i-1][j-1]
cost_min = min(cost_delete, cost_insert, cost_equal_or_replace)
# Recurse through reverse path for each operation
if cost_insert == cost_min:
insert_paths = levenshtein_reverse_recursive(matrix,i-1,j,path_len+1)
for insert_path in insert_paths: paths.append(insert_path + [('insert',i-1,j)])
if cost_delete == cost_min:
delete_paths = levenshtein_reverse_recursive(matrix,i,j-1,path_len+1)
for delete_path in delete_paths: paths.append(delete_path + [('delete',i,j-1)])
if cost_equal_or_replace == cost_min:
if cost_equal_or_replace == cost:
equal_paths = levenshtein_reverse_recursive(matrix,i-1,j-1,path_len)
for equal_path in equal_paths: paths.append(equal_path)
else:
replace_paths = levenshtein_reverse_recursive(matrix,i-1,j-1,path_len+1)
for replace_path in replace_paths: paths.append(replace_path + [('replace',i-1,j-1)])
return paths
if __name__ == "__main__":
if len(sys.argv) < 3:
print "Usage: rulegen.py word password"
sys.exit(1)
word = sys.argv[1]
password = sys.argv[2]
matrix = levenshtein(word,password)
levenshtein_print(matrix,word,password)
paths = levenshtein_reverse_path(matrix,word,password)
min_edit_distance = matrix[-1][-1]
for path in paths:
if len(path) == min_edit_distance:
print "[*] Edit path for '%s' to '%s'" % (word,password)
for op,p,w in path:
if op == 'insert':
print " [+] Inserted %s" % password[p]
elif op == 'delete':
print " [+] Deleted %s" % word[w]
elif op == 'replace':
print " [+] Replaced %s with %s" % (word[w],password[p])
Let's observe some of its output for a transformation from "password" to "p@ssword":
$ python rulegen6.py password p@ssword
      p  a  s  s  w  o  r  d
   0  1  2  3  4  5  6  7  8
p  1  0  1  2  3  4  5  6  7
@  2  1  1  2  3  4  5  6  7
s  3  2  2  1  2  3  4  5  6
s  4  3  3  2  1  2  3  4  5
w  5  4  4  3  2  1  2  3  4
o  6  5  5  4  3  2  1  2  3
r  7  6  6  5  4  3  2  1  2
d  8  7  7  6  5  4  3  2  1
[*] Edit path for 'password' to 'p@ssword'
    [+] Replaced a with @
      p  a  s  s  w  o  r  d
  #0  1  2  3  4  5  6  7  8
p  1 #0  1  2  3  4  5  6  7
@  2  1 *1  2  3  4  5  6  7
s  3  2  2 *1  2  3  4  5  6
s  4  3  3  2 *1  2  3  4  5
w  5  4  4  3  2 *1  2  3  4
o  6  5  5  4  3  2 *1  2  3
r  7  6  6  5  4  3  2 *1  2
d  8  7  7  6  5  4  3  2 *1
$ python rulegen6.py password p@sswrd1
      p  a  s  s  w  o  r  d
   0  1  2  3  4  5  6  7  8
p  1  0  1  2  3  4  5  6  7
@  2  1  1  2  3  4  5  6  7
s  3  2  2  1  2  3  4  5  6
s  4  3  3  2  1  2  3  4  5
w  5  4  4  3  2  1  2  3  4
r  6  5  5  4  3  2  2  2  3
d  7  6  6  5  4  3  3  3  2
1  8  7  7  6  5  4  4  4  3
[*] Edit path for 'password' to 'p@sswrd1'
    [+] Replaced a with @
    [+] Deleted o
    [+] Inserted 1
There are some situations where there may be multiple equally good paths:
$ python rulegen6.py password p4ss0rd
      p  a  s  s  w  o  r  d
   0  1  2  3  4  5  6  7  8
p  1  0  1  2  3  4  5  6  7
4  2  1  1  2  3  4  5  6  7
s  3  2  2  1  2  3  4  5  6
s  4  3  3  2  1  2  3  4  5
0  5  4  4  3  2  2  3  4  5
r  6  5  5  4  3  3  3  3  4
d  7  6  6  5  4  4  4  4  3
[*] Edit path for 'password' to 'p4ss0rd'
    [+] Replaced a with 4
    [+] Replaced w with 0
    [+] Deleted o
[*] Edit path for 'password' to 'p4ss0rd'
    [+] Replaced a with 4
    [+] Deleted w
    [+] Replaced o with 0
      p  a  s  s  w  o  r  d
  *0  1  2  3  4  5  6  7  8
p  1 *0  1  2  3  4  5  6  7
4  2  1 *1  2  3  4  5  6  7
s  3  2  2 *1  2  3  4  5  6
s  4  3  3  2 *1 *2  3  4  5
0  5  4  4  3  2 *2 *3  4  5
r  6  5  5  4  3  3 *3 *3  4
d  7  6  6  5  4  4  4  4 *3
The reverse Levenshtein path algorithm above has several limitations. The algorithm is designed to try to find paths corresponding to the shortest edit distance; however, there are some situations where the password went through more transformations than determined by the Levenshtein path:
$ python rulegen6.py password passwr1
      p  a  s  s  w  o  r  d
   0  1  2  3  4  5  6  7  8
p  1  0  1  2  3  4  5  6  7
a  2  1  0  1  2  3  4  5  6
s  3  2  1  0  1  2  3  4  5
s  4  3  2  1  0  1  2  3  4
w  5  4  3  2  1  0  1  2  3
r  6  5  4  3  2  1  1  1  2
1  7  6  5  4  3  2  2  2  2
[*] Edit path for 'password' to 'passwr1'
    [+] Deleted o
    [+] Replaced d with 1
Future work
A possible solution to the above is to introduce some preprocessing to help account for the specifics of password generation. For example, if we detect a password which ends with non-alpha characters, then we may elect to remove them (while preserving and later combining insertion rules) and submit the result to the reverse Levenshtein path algorithm for further processing. I am still evaluating whether this approach has sufficient benefit for the challenge of generating reverse word mangling rules.Generating Hashcat Rules
The last part of the original challenge of generating password rules was to make the research practical by generating rules using syntax understood by a modern password cracker such as Hashcat. You can review the exact syntax at Hashcat - rule_based_attack Wiki page. Insertion, Deletion and Replacement rules can be easily converted as follows:| Levenshtein OP | Hashcat Rule | Hashcat Description | 
|---|---|---|
| (insert,i,j) | iNX | Inserts character X at position N | 
| (delete,i,j) | DN | Deletes character at position N | 
| (replace,i,j) | oNX | Overwrites character at postion N with X | 
rulegen.py from the Password Analysis and Cracking Kit:$ python rulegen.py -q --verbose --simplerules --word password --password p4sword1
[*] Analyzing password: p4sword1
[+] password => o14 D3 i71 => p4sword1
[*] Finished analysis in 0.00 seconds
p4sword1 is o14 D3 i71, corresponding to replacement of letter 'a', deletion of letter 's' and insertion of number '1'.Advanced Hashcat Rules
Hashcat implements a wide range of rules such as global character replacements, appendix, prefix, capitalization, etc. These advanced rules can much more precisely describe users' actions than the standard insert, delete and replace triad. By implementing additional analysis it is possible to observe that the insertion occurs at the very end, thus it corresponds to the$1 rule (append 1) and the replacement of 'a' to '4', could be converted to a global replacement sa4. Let's rerun the above command without the --simplerules flag:$ python rulegen.py -q --verbose --word password --password p4sword1
[*] Analyzing password: p4sword1
[+] password => sa4 D3 $1 => p4sword1
| Levenshtein OP | Hashcat Rule | Hashcat Description | 
|---|---|---|
| (insert,i,j) where i is at the end of the word. | $c | Append character to end. | 
| (insert,i,j) where i is at the beginning of the word. | ^c | Prepend character to end. | 
| (delete,i,j) where i is at the end of the word. | ] | Delete last character. | 
| (delete,i,j) where i is at the beginning of the word. | [ | Delete first character. | 
| (replace,i,j) where i and i+1 were swapped. | *XY | Swap character X with Y. | 
| (replace,i,j) where i and i+1 were swapped where i at the beginning of the word | k | Swap the first two characters. | 
| (replace,i,j) where i and i+1 were swapped where i is at the end of the word | *XY | Swap the last two characters. | 
| (replace,i,j) where a case changed for the character i | TN | Toggle the case of characters at position N. | 
| (replace,i,j) where all characters at the index location i in the source word were replaced in the password | sXY | Replace all instances of X with Y. | 
| (replace,i,j) where replacement character at i is an ASCII increment | +N | Increment character @ N by 1 ascii value. | 
| (replace,i,j) where replacement character at i is an ASCII derement | -N | Decrement character @ N by 1 ascii value. | 
| (replace,i,j) where replacement character at i is a left bitwise shift | LN | Bitwise shift left character @ N. | 
| (replace,i,j) where replacement character at i is a right bitwise shift | RN | Bitwise shift right character @ N. | 
| (replace,i,j) where the next character at i+1 was replaced with current at i | .N | Replaces character @ N with value at @ N plus 1. | 
| (replace,i,j) where the previous character at i-1 was replaced with current at i | ,N | Replaces character @ N with value at @ N minus 1. | 
$ python rulegen.py -q --verbose --word password --password pa55words
[*] Analyzing password: pa55words
[+] password => ss5 $s => pa55words
ss5 are particularly hard to detect if comparing source and target words which include insertions, deletions or other prior substitutions. Because Hashcat rules are applied in a sequential order, the global replacement ss5 actually works in this case, because the last letter 's' was appended after the global replacement of all characters 's'. However in the case below, global replacement would not work as expected:$ python rulegen10.py -q --verbose --word password --password pa5swords
[*] Analyzing password: pa5swords
[+] password => o25 $s => pa5swords
# Hashcat Rules Engine
hashcat_rule = dict()
# Dummy rule
hashcat_rule[':'] = lambda x: x
# Case rules
hashcat_rule["l"] = lambda x: x.lower()
hashcat_rule["u"] = lambda x: x.upper()
hashcat_rule["c"] = lambda x: x.capitalize()
hashcat_rule["C"] = lambda x: x[0].lower() + x[1:].upper()
hashcat_rule["t"] = lambda x: x.swapcase()
hashcat_rule["T"] = lambda x,y: x[:y] + x[y].swapcase() + x[y+1:]
hashcat_rule["E"] = lambda x: " ".join([i[0].upper()+i[1:] for i in x.split(" ")])
hashcat_rule["l"](word) will return the original source word with all characters lowered. There may still be situations when even while preserving the state global rules may produce unexpected results. Consider this example:
 $ python rulegen.py -q --verbose --word password --password ps55words
[*] Analyzing password: ps55words
[+] password => D1 o25 i35 $s => ps55words
[+] password => D1 i25 ,3 $s => ps55words
[+] password => o1s o25 ,3 $s => ps55words
Future work
Another challenge is detection of rules which affect indices of all other operations such as word rotation, reversal and shifting. The challenge here is mixing these rules (after being properly detected) with other rules (e.g. substitutions) that may have occurred after them. Detection of these rules is even more unusual, because it requires actually attempting them and observing whether generated rules are better or worse with and without them. This bruteforce approach increases computational time of the rules and introduces additional complexity when combining bruteforce rules with normally generated ones. It may be that bruteforcing and trying each of these rules in parallel is the best solution; however, this is still an area of research and experimentation.Results
Below is the source word and password rule analysis for several leaked password lists using tools and techniques introduced in this paper. All of the rules were generated using PACK using Enchant's Aspell provider and a standard English dictionary.NOTE: Obviously incorrectly guessed words, such as single characters, were removed from final results.
| Source | Top Dictionary Words | Top Password Rules | 
|---|---|---|
| RockYou | love, password, baby, olive, alex, oliver, jim, jam, march and sexy | : - 1079666 (2.22%) | 
| linked, linkedin, password, alex, mike, jim, jfs, jam, jack and job | : - 749269 (3.21%) | |
| MySpace | password, olive, myspace, love, hearts, baseball, softball, football, cutie and chicken | $1 - 5372 (5.48%) | 
| Stratfor | skyjack, stratify, jukebox, Kazakh, Uzbek, KGB, password, Bangkok, Reykjavik and Augsburg | : - 33481 (0.37%) | 
| Singles.org | Jesus, love, angel, loveme, faith. | : - 4786 (25.62%) | 
| FaithWriters | Jesus, password, blessed, faith, writer, freedom, grace, mother | : - 3182 (18.06%) | 
| PHPBB | password, phoebe, phobia, dragon, forum, alpha, master, Mike, admin, qwerty | : - 43681 (5.94%) | 
$@ $a $o $l $. $c $o $m
^8 ^7 ^4 ^1 o6q T7
+1 i2e $2 $1 $0 $7 $2 $0 $0 $7
sW- $- $! $- $- $! $<
^y ^m ^4 o9b oAi $z
Conclusion
Automatic password rule reversal is not only possible, but as was illustrated above it is practical and can be used with great results to profile user passwords. There are several limitations to the above research as was illustrated in the final analysis (need better wordlists, spellchecking, expanded rule detection, etc.); however, I hope the work so far will give you enough interest to start exploring this field more.Above all, I hope to bring awareness to the inherent weakness of dictionary based pass"words" no matter how many special mangling rules users apply. It is time to switch to pass"phrases" or other forms of authentication.
Acknowledgements
Special thanks goes to atom and Team Hashcat who both inspired and helped encourage me to complete an early version of this research in time for the Crack Me If You Can competition. Go Team Hashcat!Also thanks to my coworker, Daniel South, for a crucial pointer to explore the area of DNA sequencing algorithms which became very much relevant to this research. Keep on spreading the seed of security!
Related Work
- Mangling Rules Generation by Simon Marechal
 
沒有留言:
張貼留言