0

I have multiple files that I'm iterating through and for each file I check if a pattern exists there or not and the pattern can either exist once or multiple times or it doesn't exist at all. I want to edit the line that has the pattern once the pattern is found and rewrite the line with the pattern only. If the pattern is not there then I close the file without any modifications. My code:

for root, dirs, files in os.walk('C:/Users/Documents/'):
    for fname in files:
            for line in fileinput.input(os.path.join(root, fname), inplace = 1): 
                    if re.search(r"([\w-d])", line):
                        x=re.sub(r"([\w-d])", r"(\1).", line)
                        print line.replace(line,x)

The problem is it changes the pattern fine when it finds it but for the files that doesn't have the pattern it deletes their contents completely. And if the pattern exists in multiple lines, it keeps one line only and deletes the rest. What am I missing?

EDIT

I'm flexible also to use "open" or any other method that can solve my problem. My main concern is I don't want to rewrite the lines in files that don't have the pattern. For tracking purposes I want to only modify the files that has the pattern. so far my research online [1] [2][3] shows that I can either write to a temp file and use it later as original file or read all the lines and then write all of them again regardless if the file has the pattern or not. is there a better way of solving this problem?

Community
  • 1
  • 1
tkyass
  • 2,968
  • 8
  • 38
  • 57

1 Answers1

1

but for the files that doesn't have the pattern it deletes their contents completely

Yes because that's how fileinput.input works. You need to print all lines, no matter whether you changed them or not.

for line in fileinput.input(os.path.join(root, fname), inplace=1): 
    if re.search(r"([\w-d])", line):
        x=re.sub(r"([\w-d])", r"(\1).", line)
        print line.replace(line, x)
    else:
        print line

Also consider using sys.stdout.write(line) as print adds a new line. Whereas line is the line read from the file including the new line at the end.

Thus if we have a file called test.txt with the following contents:

a1
b2
c3

Then if we do this:

for line in fileinput.input("test.txt", inplace=1):
    if line.strip() == "b2":
        sys.stdout.write("-> ")
        sys.stdout.write(line)
    else:
        sys.stdout.write(line)

Then after that the file will look like this:

a1
-> b2
c3

So you do need to write unchanged lines as well.

Edit:

The most flexible is probably to do it like that. However you could read the file beforehand to check if the pattern exist and then do the same:

f = open(os.path.join(root, fname), "r")
found = False
for line in f:
    if re.search(r"([\w-d])", line):
        found = True
        break
f.close()

if found:
    for line in fileinput.input(os.path.join(root, fname), inplace=1): 
        if re.search(r"([\w-d])", line):
            x=re.sub(r"([\w-d])", r"(\1).", line)
            print line.replace(line, x)
        else:
            print line
vallentin
  • 23,478
  • 6
  • 59
  • 81
  • thanks @Vallentin for your answer! Could you please review my edited question. – tkyass Mar 31 '17 at 14:35
  • thanks @Vallentin for the updated answer its working great. the only issue I'm facing is that it keeps adding new lines and the file has almost doubled in size for number of lines. any idea what might be causing this and how to solve it? – tkyass Mar 31 '17 at 16:41
  • @tkyass As already mentioned in my answer, it's because you're using `print` instead use `sys.stdout.write`. – vallentin Mar 31 '17 at 17:01