Pattern searching in a file and replacing the found results

Question

I'm trying to write a simple program that will open text files in given directory search for all strings that match a given pattern and replace them with the desired string while removing all other info. I have two .txt files:

User_321.txt which contains:

321_AliceKelly001.jpg [size_info] [date_info] [geo_location_info] ... [other info]
321_AliceKelly002.jpg [size_info] [date_info] [geo_location_info] ... [other info] 
321_AliceKelly003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
 ...
321_AliceKelly125.jpg [size_info] [date_info] [geo_location_info] ... [other info]

and User_205.txt which contains:

 205_CarlCarlson001.jpg [size_info] [date_info] [geo_location_info] ... [other info]
 205_CarlCarlson002.jpg [size_info] [date_info] [geo_location_info] ... [other info]
 205_CarlCarlson_003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
 205_CarlCarlson007.jpg [size_info] [date_info] [geo_location_info] ... [other info]

I want User_321.txt to contain:

321_AliceKelly_001.jpg
321_AliceKelly_002.jpg 
321_AliceKelly_003.jpg
 ...
321_AliceKelly_125.jpg

and User_205.txt to contain:

 205_CarlCarlson_001.jpg
 205_CarlCarlson_002.jpg
 205_CarlCarlson_003.jpg
 205_CarlCarlson_007.jpg

So I simply want to add "_" between the name and last 3 digits. I'm able to handle the case where all the entries are uniform, that is only contain entries of the following form:

     \d\d\d_[a-zA-Z]\d\d\d.jpg [size_info] [date_info] [geo_location_info] ... [other info]

with the following code:

import os, re,

path = 'C:\\Users\\ME\\Desktop\\TEST'
text_files = [filename for filename in os.listdir(path)]

desired_text = re.compile(r'\w+.jpg')
#desired_ending = re.compile(r'$[a-zA-Z]\d\d\d.jpg')

for i in range(len(text_files)):
    working_file = path + '\\' + text_files[i]
    fin = open(working_file, 'r')
    match = ''

    for line in fin:
        mo1 = desired_text.search(line)
        if mo1 != '':
            match += mo1.group()[:-7] + '_' + mo1.group()[-7:]+'\n'

    fin.close()

    fout = open(working_file, 'w')
    fout.write(match)
    fout.close()

I'm having a difficult time with the second case, that is when I have an entry that is already in the desired form, like with:

 205_CarlCarlson_003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
 205_CarlCarlson007.jpg [size_info] [date_info] [geo_location_info] ... [other info].

I would like for it to skip renaming the entry that is already in the desired form and continue with the rest.

I've had a look at How to search and replace text in a file using Python? and Cheap way to search a large text file for a string, and Search and replace a line in a file in Python. These cases seem to be concerned with searching for a specific string and replacing it with another using the fileinput module. I would like to do something similar but be a little more flexible in its search.

Replace the `desired_text` regex with `r'^\s*\d{3}_[^\W_]+\.jpg'`. If there is a match, add a `_`. If there is no match, the `_` must be there. — Wiktor Stribiżew, Feb 08 '16 at 23:03

bobble bubble · Answer 1 · 2016-02-08T23:27:25.770

1

You can use parentheses for grouping and capturing

\b(\d{3}_[a-zA-Z]+)(\d{3}\.jpg)

and replace with \1_\2 to add an underscore in between.

\b matches a word boundary
Rest like your sample form, separated in two groups.

See demo at regex101 (Python code generator)

edited Feb 08 '16 at 23:27

answered Feb 08 '16 at 23:06

bobble bubble

16,888
3
27
46

score 1 · Accepted Answer · answered Feb 08 '16 at 23:19

I have slightly modificated your code, handling the two different cases, and it seems to work:

import os, re

path = 'C:\\Users\\ME\\Desktop\\TEST'
text_files = [filename for filename in os.listdir(path)]

desired_text1 = re.compile(r'^\d{3}_[a-zA-Z]+\d{3}.jpg')
desired_text2 = re.compile(r'^\d{3}_[a-zA-Z]+_\d{3}.jpg')

for i in range(len(text_files)):
    working_file = path + '\\' + text_files[i]
    fin = open(working_file, 'r')
    match = ''

    for line in fin:
        mo1 = desired_text1.search(line)
        mo2 = desired_text2.search(line)
        if mo1:
            match += mo1.group()[:-7] + '_' + mo1.group()[-7:]+'\n'
        elif mo2:
            match += mo2.group() +'\n'

    fin.close()

    fout = open(working_file, 'w')
    fout.write(match)
    fout.close()

Casimir et Hippolyte · Answer 3 · 2016-02-09T00:58:26.530

0

You can do that:

with open('source.txt') as f:
    with open('destination.txt', 'w') as g:
        for line in f:
            parts = line.split(None, 1)
            if parts[0][-8:-7] == '_':
                g.write(parts[0] + '\n')
            else:
                g.write(parts[0][:-7] + '_' + parts[0][-7:] + '\n')

Feel free to change \n to \r\n if you want a Windows newline sequence.

edited Feb 09 '16 at 00:58

answered Feb 08 '16 at 23:39

Casimir et Hippolyte

88,009
5
94
125

Pattern searching in a file and replacing the found results

3 Answers3