0

I asked this question: How to find and replace multiple lines in text file? but was ultimately unclear in my question so I'm asking another one to be more specific.

I have Python 2.7.

I have three text files, data.txt, find.txt and replace.txt.

data.txt is about 1MB large file with several thousand lines. Now, I have a find.txt file containing X number of lines that I want to find in data.txt and replace with Y number of lines in replace.txt X and Y may be the same number, or it may not.

For example:

data.txt

pumpkin
apple
banana
cherry
himalaya
skeleton
apple
banana
cherry
watermelon
fruit

find.txt

apple
banana
cherry

replace.txt

1
2
3
4
5

So, in the above example, I want to search for all occurences of apple, banana, and cherry in the data and insert 1,2,3,4,5 in its place.

So, the resulting data.txt would look like:

pumpkin
1
2
3
4
5
himalaya
skeleton
1
2
3
4
5
watermelon
fruit

Or, if the number of lines in replace.txt were less than that of find.txt:

pumpkin
1
2
himalaya
skeleton
1
2
watermelon
fruit

I am having some trouble with the right approach to this as my data.txt is about 1MB so I want to be as efficient as possible. One dumb way is to concatenate everything into one long string and use replace, and then output to a new text file so all the line breaks will be restored.

data = open("data.txt", 'r')
find = open("find.txt", 'r')
replace = open("replace.txt", 'r')

data_str = ""
find_str = ""
replace_str = "" 

for line in data: # concatenate it into one long string
    data_str += line

for line in find: # concatenate it into one long string
    find_str += line

for line in replace: 
    replace_str += line


new_data = data_str.replace(find, replace)
new_file = open("new_data.txt", "w")
new_file.write(new_data)

But this seems so convoluted and inefficient for a large data file like mine.

The pseudo-code for something that I would like to see:

Something like this:

(x,y) = find_lines(data.txt, find.txt) # returns line numbers in data.txt that contains find.txt
replace_data_between(x, y, data.txt, replace.txt) # replaces the data between lines x and y with replace.txt

def find_lines(...):
    location = 0

    LOOP1: 
    for find_line in find:
        for i, data_line in enumerate(data).startingAtLine(location):
            if find_line == data_line:
                location = i # found possibility

    for idx in range(NUMBER_LINES_IN_FIND):
        if find_line[idx] != data_line[idx+location]  # compare line by line
            #if the subsequent lines don't match, then go back and search again
            goto LOOP1

As you can see, I am having trouble with the logic of this all. Can someone point me in the right direction?

Community
  • 1
  • 1
noblerare
  • 10,277
  • 23
  • 78
  • 140
  • go through `find.txt` and `replace.txt` together, make a lookup `dict`. Then go through `data.txt` and replace each line with its value in the lookup dict, if it's there. You need to be more specific on the layout of `find` and `replace.txt` to get more specific help than that. – roippi Feb 07 '14 at 22:30
  • without a 1:1 mapping, what *is* the mapping of `find` to `replace`? instead of a text file, can you give us a python object? – mhlester Feb 07 '14 at 22:33
  • what do you want to happen when X and Y are not the same? What happens when X > Y and when X < Y? Also, if you have made attempts at this already, please share your code – Totem Feb 07 '14 at 22:40
  • @Totem: Thanks for your comments. I edited my question so I hope that gives more information. – noblerare Feb 10 '14 at 15:25
  • @roippi: Thanks for your comments. I edited my question so I hope that gives more information. – noblerare Feb 10 '14 at 16:18
  • @noblerare do the lines you want to find always come all together like that? – Totem Feb 10 '14 at 20:04
  • @Totem: yes, the `find.txt` file will contain line-by-line, the exact lines that I want to find and replace – noblerare Feb 10 '14 at 23:32
  • @noblerare and in the data.txt file? Do they always appear together just like in find.txt? – Totem Feb 11 '14 at 00:04
  • @Totem: Yes they do but they must match exactly. In other words, there may be a line of `apple` followed by `foobar` in which case it _doesn't_ match the lines in `find.txt` (using the example above). So, if I had `data.txt` printed out onto paper and I had a knife, I want to find all exact lines in `find.txt`, cut them out and replace them with `replace.txt`. Hope that makes sense. – noblerare Feb 11 '14 at 14:15

1 Answers1

0

If the files are small enough to do this in ram...

I would first map the find:replace relationship:

find_replace_dict = {find_string:replace_string}

then i would walk through the data file...

of = open('output_file','wt')
for line in data_file:
    if line in find_replace_dict.keys():
        of.write(find_replace_dict[line])
    else:
        of.write(line)
of.close()
Cam
  • 478
  • 1
  • 4
  • 13
  • Thanks for your answer but some of the comments that I got asked me to be more specific in my question so I edited the question. – noblerare Feb 10 '14 at 16:18