I'm trying to write a simple program that will open text files in given directory search for all strings that match a given pattern and replace them with the desired string while removing all other info. I have two .txt files:
User_321.txt which contains:
321_AliceKelly001.jpg [size_info] [date_info] [geo_location_info] ... [other info]
321_AliceKelly002.jpg [size_info] [date_info] [geo_location_info] ... [other info]
321_AliceKelly003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
...
321_AliceKelly125.jpg [size_info] [date_info] [geo_location_info] ... [other info]
and User_205.txt which contains:
205_CarlCarlson001.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson002.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson_003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson007.jpg [size_info] [date_info] [geo_location_info] ... [other info]
I want User_321.txt to contain:
321_AliceKelly_001.jpg
321_AliceKelly_002.jpg
321_AliceKelly_003.jpg
...
321_AliceKelly_125.jpg
and User_205.txt to contain:
205_CarlCarlson_001.jpg
205_CarlCarlson_002.jpg
205_CarlCarlson_003.jpg
205_CarlCarlson_007.jpg
So I simply want to add "_" between the name and last 3 digits. I'm able to handle the case where all the entries are uniform, that is only contain entries of the following form:
\d\d\d_[a-zA-Z]\d\d\d.jpg [size_info] [date_info] [geo_location_info] ... [other info]
with the following code:
import os, re,
path = 'C:\\Users\\ME\\Desktop\\TEST'
text_files = [filename for filename in os.listdir(path)]
desired_text = re.compile(r'\w+.jpg')
#desired_ending = re.compile(r'$[a-zA-Z]\d\d\d.jpg')
for i in range(len(text_files)):
working_file = path + '\\' + text_files[i]
fin = open(working_file, 'r')
match = ''
for line in fin:
mo1 = desired_text.search(line)
if mo1 != '':
match += mo1.group()[:-7] + '_' + mo1.group()[-7:]+'\n'
fin.close()
fout = open(working_file, 'w')
fout.write(match)
fout.close()
I'm having a difficult time with the second case, that is when I have an entry that is already in the desired form, like with:
205_CarlCarlson_003.jpg [size_info] [date_info] [geo_location_info] ... [other info]
205_CarlCarlson007.jpg [size_info] [date_info] [geo_location_info] ... [other info].
I would like for it to skip renaming the entry that is already in the desired form and continue with the rest.
I've had a look at How to search and replace text in a file using Python? and Cheap way to search a large text file for a string, and Search and replace a line in a file in Python. These cases seem to be concerned with searching for a specific string and replacing it with another using the fileinput module. I would like to do something similar but be a little more flexible in its search.