-1

Apologies if this is an incredibly simple question...I'm completely new to Python and am learning as I go.

An old post (Find all combinations (upper and lower and symbols) of a word in python) shows a way to provide multiple permutations of a input word into leet-speak (Thank you Moose!). The code works beautifully, but the code presented only allows one input word; in this case: Password.

I want to use a text file, with one word per line, as input into the code snippet shown in the link above and save the results into a new text file.

I would have thought it rather straightforward: open input file as read only, open output file for writing, substitute the value of infile.readlines() into the def and write result to the outfile. Rinse and repeat. Yet, despite trying a few different approaches and syntax, I can't get this to work.

My botched attempt to modify moose's code looks like this:

#!/usr/bin/python
# -*- coding: utf-8 -*-

from itertools import product

def getAllCombinations(password):
    leet = ["Aa@","Bb","Cc", "Dd","Ee","Ff","Gg","Hh","Ii","Jj","Kk",
            "Ll","Mm","Nn","Oo0","Pp","Qq","Rr","Ss5","Tt","Uu","Vv",
            "Ww","Xx","Yy","Zz"]

    getPlaces = lambda password: [leet[ord(el.upper()) - 65] for el in password]

    for letters in product(*getPlaces(password)):
        yield "".join(letters)

with open("wordlist_in.txt", "r") as infile, open("wordlist_out.txt", "w") as outfile:
    data = infile.readlines()
    for el in getAllCombinations(data):    <<<Pretty sure this is where I go wrong
        outfile.write(el+'\n')

How do I get the string contained in each line of the file to be the input for getAllCombinations?

Thank you in advance for your help!

Betawave
  • 1
  • 2
  • What does your *wordlist_in.txt* file look like? – Patrick Carroll Feb 03 '16 at 04:11
  • What is your end goal? Why do you need to output all of the combinations? Perhaps there is a better way of achieving that goal. – pzp Feb 03 '16 at 04:32
  • I'm using the output to create massive wordlists that are, in turn, used as sources for hash cracking. I use rules wherever possible to get the most out of GPGPU performance, but sometimes you just need a few good word lists as source, especially for combinator attacks. – Betawave Feb 03 '16 at 04:34

2 Answers2

0

I'm guessing your wordlist_in.txt looks like this

word1
another_word
more_words

In that case, you only want to pass one word at a time to the function:

data = infile.readlines()
for line in data:
    for el in getAllCombinations(line):
        outfile.write(el+'\n')
  • You're correct about the wordlist format. I had thought about the "for line in data:" line, but that throws the following errors: Traceback (most recent call last): File "l33t.pl", line 20, in for el in getAllCombinations(line): File "l33t.pl", line 14, in getAllCombinations for letters in product(*getPlaces(password)): File "l33t.pl", line 11, in getPlaces = lambda password: [leet[ord(el.upper()) - 65] for el in password] IndexError: list index out of range – Betawave Feb 03 '16 at 04:26
  • Better yet use `for line in infile:`. File pointers are also iterables. And you could also use `outfile.write('\n'.join(getAllCombinations(line)))` to save multiple (costly) write operations. – pzp Feb 03 '16 at 04:27
  • The writes are buffered, so it won't be as bad as it seems at first glance. Also, take a look at https://docs.python.org/3.5/library/fileinput.html when dealing with file input—it will allow your script to work with stdin, so e.g. `combinations.py < words.txt` – Ben Graham Feb 03 '16 at 04:28
  • Yes, could do... and I may try that... but I'm wondering if I'll run into memory issues as the files get very large, very quickly (a few words can end up being many GB in size). – Betawave Feb 03 '16 at 04:29
  • @BenGraham Good call, you're right about that one. But the first comment still stands. – pzp Feb 03 '16 at 04:30
  • @BenGraham - I tried the stdin approach, but end up with the same IndexError: list index out of range. I'm sure there's an obvious error, but it's eluding me at the moment due to my poor understanding of Python. Is it that the code is trying to pass in an array instead of a single value at a time? – Betawave Feb 03 '16 at 04:46
  • @Betawave, it may be best to work with Patrick Carroll's suggestion and come back to the `fileinput` module when you have that working. I would advise breaking up the line where you define `getPlaces`. Do one thing per line. Then the traceback will be able to tell you where the problem happened more accurately. You can also run your code like `python -m pdb -c continue script.py`; this will drop you into a debugger when an exception occurs. – Ben Graham Feb 03 '16 at 04:55
  • Good advice. Thank you. I just commented out "for el in getAllCombinations(line)" and "outfile.." and replaced with 'print line'. While the input wordlist is in the format: alpha bravo charlie the output from print line was: alpha bravo charlie So, I know it's reading the file correctly, but it seems to be picking up a control character I can't see (possibly or similar...and that may be causing a problem with using the file lines as input into the function. Any thoughts on that? – Betawave Feb 03 '16 at 05:04
  • Realized comments don't show my formatting. To be clear, the input wordlist is in the format of one word per line. The output from print line didn't match... it was was word, blank line, word, blank line, word... – Betawave Feb 03 '16 at 05:12
-1

After taking the advice from Patrick Carroll, pzp and Ben Graham (thanks for responding!! I'll definitely look at some of the options you presented for improving the code.) and doing some trial and error, I found that there really wasn't much wrong with my initial attempts and I had gotten it right at least once on my own with the exception of one little issue.

Each word in my input wordlist is on its own line. When Python reads the file, it included the invisible '\n' control character as part of the word, e.g., alpha becomes alpha\n. This control character was freaking out the Python function. By inserting a new line and using rstrip(), I was able to correct the problem and the code ran perfectly.

Here is the final modified code (all credit to @moose for the original code) that works as intended:

#!/usr/bin/python
# -*- coding: utf-8 -*-

from itertools import product

def getAllCombinations(password):
    leet = ["Aa@","Bb","Cc", "Dd","Ee","Ff","Gg","Hh","Ii","Jj","Kk",
            "Ll","Mm","Nn","Oo0","Pp","Qq","Rr","Ss5","Tt","Uu","Vv",
            "Ww","Xx","Yy","Zz"]

    getPlaces = lambda password: [leet[ord(el.upper()) - 65] for el in password]

    for letters in product(*getPlaces(password)):
        yield "".join(letters)

with open("wordlist_in.txt", "r") as infile, open("wordlist_out.txt", "w") as outfile:
    data = infile.readlines()
    for line in data:
        line=line.rstrip('\n')
        for el in getAllCombinations(data):
            outfile.write(el+'\n')

Happy coding,

Betawave

Betawave
  • 1
  • 2