1

I want to convert all words in a standard dictionary (for example : /usr/share/dict/words of a unix machine) integer and find XOR between every two words in the dictionary( ofcourse after converting them to integer) and probably store it in a new file.

Since I am new to python and because of large file sizes, the program is getting hung every now and then.

import os
dictionary = open("/usr/share/dict/words","r")
'''a = os.path.getsize("/usr/share/dict/words")
c = fo.read(a)'''
words = dictionary.readlines()

foo = open("word_integer.txt", "a")


for word in words:
    foo.write(word)
    foo.write("\t")
    int_word = int(word.encode('hex'), 16)
    '''print int_word'''
    foo.write(str(int_word))
    foo.write("\n")

foo.close()
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
kingmakerking
  • 2,017
  • 2
  • 28
  • 44
  • Explain what you mean by "XORing" words. How are you defining the result of the XOR of two **characters** (not bytes)? What should happen when the words are of different lengths? – Karl Knechtel Mar 02 '14 at 23:38
  • @AaronHall It is Python 2.7.4 (default, Sep 26 2013, 03:20:56) [GCC 4.7.3] on linux2 – kingmakerking Mar 02 '14 at 23:44
  • @KarlKnechtel Hmm... I did not think of this condition at all. I am trying to decrypt a one time pad cipher text. All I know is, cipher texts c1= 4ADD55BA941FE954 and c2=5AC643BE8504E35E (eight bytes each, presented in hex) and they are encrypted using same key.Another information I have is they are encryption of english words. So I trying to XOR all words in the dictionary and see whether it matches with XOR of c1 and C2 – kingmakerking Mar 02 '14 at 23:48
  • I don't understand. What is the word being XOR'd with? What does that have to do with figuring out the plain text? It sounds like you have unknown words `W1` and `W2`, and a common key `K`, and you are given that `W1 ^ K = C1`, `W2 ^ K = C2`; is that right? – Karl Knechtel Mar 03 '14 at 11:03
  • I solved it, I now have one problem...Reducing the complexity of the program. :( @KarlKnechtel and yes you are right. That is what the brute force attack is for. I love the mathematical simplicity over programming complexity. Thanks a ton to AaronHall as I am inspired by that logic. – kingmakerking Mar 03 '14 at 19:33

2 Answers2

2

First we need a method to convert your string to an int, I'll make one up (since what you're doing isn't working for me at all, maybe you mean to encode as unicode?):

def word_to_int(word):
    return sum(ord(i) for i in word.strip())

Next, we need to process the files. The following works in Python 2.7 onward, (in 2.6, just nest two separate with blocks, or use contextlib.nested:

with open("/usr/share/dict/words","rU") as dictionary: 
    with open("word_integer.txt", "a") as foo:
        while dictionary:
            try:
                w1, w2 = next(dictionary), next(dictionary)
                foo.write(str(word_to_int(w1) ^ word_to_int(w2)))
            except StopIteration:
                print("We've run out of words!")
                break
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
  • @Just this code will run? or should I add it with my code? I am not understanding. When I try to run your code alone, it says syntax error and points at "as" in the with (open("/usr/share/dict/words","rU") as dictionary, – kingmakerking Mar 02 '14 at 23:25
  • @user2888239 This should work as a drop-in replacement for your code above. Remember to use the context manager `with` to open files, it automatically handles closing them for you if you get an error. – Russia Must Remove Putin Mar 02 '14 at 23:26
  • This is how it looks like now, but no luck. I guess I dont know how to use it import os def word_to_int(word): return sum(ord(i for i in word)) from contextlib import contextmanager @contextmanager with (open("/usr/share/dict/words","rU") as dictionary, open("word_integer2.txt", "a") as foo): while dictionary: try: w1, w2 = next(dictionary), next(dictionary) foo.write(word_to_int(w1) ^ word_to_int(w2)) except StopIteration: print('We've run out of words!') – kingmakerking Mar 02 '14 at 23:42
  • Let me see if I have what you want straight: open a file with a bunch of words, pull out two words at a time, process them to integers (what are the exact specifications here?), `xor` them, and then append that number to another file? I've made a couple of bug-fixes, but you need to try it again, and give me the error if you see one. – Russia Must Remove Putin Mar 02 '14 at 23:50
0

This code seems to work for me. You're likely running into efficiency issues because you are calling readlines() on the entire file which pulls it all into memory at once.

This solution loops through the file line by line for each line and computes the xor.

f = open('/usr/share/dict/words', 'r')                                          

pairwise_xors = {}                                                              

def str_to_int(w):                                                              
    return int(w.encode('hex'), 16)                                             

while True:                                                                     
    line1 = f.readline().strip()                                                
    g = open('/usr/share/dict/words', 'r')                                      
    line2 = g.readline().strip()                                                

    if line1 and line2:                                                         
        pairwise_xors[(line1, line2)] = (str_to_int(line1) ^ str_to_int(line2)) 
    else:                                                                       
        g.close()                                                               
        break                                                                   

f.close()             
jaynp
  • 3,275
  • 4
  • 30
  • 43
  • But here, If I am right, isn't it writing back on the /usr/share/dict/words again in 7th line inside while loop? I can change it to some other file name if the logic I have understood is true. – kingmakerking Mar 02 '14 at 22:57
  • The code didnt work, may be because it is not writing anywhere anything – kingmakerking Mar 02 '14 at 23:01
  • You should use the context manager, and you need to learn how to use next on iterators. – Russia Must Remove Putin Mar 02 '14 at 23:21
  • This is how it looks like now, but no luck. I guess I dont know how to use it import os def word_to_int(word): return sum(ord(i for i in word)) from contextlib import contextmanager @contextmanager with (open("/usr/share/dict/words","rU") as dictionary, open("word_integer2.txt", "a") as foo): while dictionary: try: w1, w2 = next(dictionary), next(dictionary) foo.write(word_to_int(w1) ^ word_to_int(w2)) except StopIteration: print('We've run out of words!') – kingmakerking Mar 02 '14 at 23:41