0

Started fiddling with Python for the first time a week or so ago and have been trying to create a script that will replace instances of a string in a file with a new string. The actual reading and creation of a new file with intended strings seems to be successful, but error checking at the end of the file displays output suggesting that there is an error. I checked a few other threads but couldn't find a solution or alternative that fit what I was looking for or was at a level I was comfortable working with.

Apologies for messy/odd code structure, I am very new to the language. Initial four variables are example values.

editElement = "Testvalue"
newElement = "Testvalue2"
readFile = "/Users/Euan/Desktop/Testfile.csv"
writeFile = "/Users/Euan/Desktop/ModifiedFile.csv"


editelementCount1 = 0
newelementCount1 = 0


editelementCount2 = 0
newelementCount2 = 0



#Reading from file
print("Reading file...")
file1 = open(readFile,'r')
fileHolder = file1.readlines()
file1.close()


#Creating modified data
fileHolder_replaced = [row.replace(editElement, newElement) for row in fileHolder]


#Writing to file
file2 = open(writeFile,'w')
file2.writelines(fileHolder_replaced)
file2.close()
print("Modified file generated!")


#Error checking

for row in fileHolder:
    if editElement in row:
        editelementCount1 +=1

for row in fileHolder:
    if newElement in row:
        newelementCount1 +=1

for row in fileHolder_replaced:
    if editElement in row:
        editelementCount2 +=1

for row in fileHolder_replaced:
    if newElement in row:
       newelementCount2 +=1

print(editelementCount1 + newelementCount1)
print(editelementCount2 +newelementCount2)

Expected output would be the last two instances of 'print' displaying the same value, however...

The first instance of print returns the value of A + B as expected.

The second line only returns the value of B (from fileHolder), and from what I can see, A has indeed been converted to B (In fileHolder_replaced).

Edit:

For example,

if the first two counts show A and B to be 2029 and 1619 respectively (fileHolder), the last two counts show A as 0 and B as 2029 (fileHolder_replace). Obviously this is missing the original value of B.

Rohan Amrute
  • 764
  • 1
  • 9
  • 23
RFM_Euan
  • 15
  • 5
  • Maybe I am to stupid, but I don't see what should be wrong in the printstatement. The prints should be different since you replaced the editElement so it should not occur anymore... – Stefan Reinhardt Dec 22 '15 at 10:20
  • Hi Stefan. It was more or less my assumption that somewhere there is an incorrect algorithm or perhaps the wrong syntax for what I was trying to do. The end print statements are functioning as intended, but the values they output should /in theory/ be the same. i.e. If I am converting A to B, and there are 20 instances of A and 10 instances of B, the output of the first statement would be 30. For the second print statement it would pick up 30 instances of B since A has been converted, so it would also be 30. – RFM_Euan Dec 22 '15 at 10:23
  • Ah yes i got it. Testvalue is a substring of Testvalue2 – Stefan Reinhardt Dec 22 '15 at 10:34

1 Answers1

0

So in am more exdented version as in the comment. If you look for "TestValue" in the modified file, it will find the string, even if you assume it is "TestValue2". Thats because the originalvalue is a substring of the modified value. Therefore it should find twice the number of occurences. Or more precise the number of lines in which the string occurs.

If you query

if newElement in row

It will have a look if the string newElement is contained in the string row

Stefan Reinhardt
  • 622
  • 8
  • 17
  • Ah, I think I see the issue with my code. Is there an alternative to 'row' that will pick up multiple instances of a string in each row? It's just occurred to me that when A is converted to B, if there is already an instance of B in that row, then only one instance will be counted. Is there a way to count multiple instances in one row? – RFM_Euan Dec 22 '15 at 10:59
  • the simplest solution for a python newbe could be to split the row by blanks with "someThing".split(' ') and walk through that list. But i'd rather sugesst a regex for doing that. Like postet here: http://stackoverflow.com/questions/1374457/find-out-how-many-times-a-regex-matches-in-a-string-in-python – Stefan Reinhardt Dec 22 '15 at 11:10
  • In that thread I noticed the count() function, but having tried it out could only get it to return data from a string. Could I perform an action like str(fileHolder).count()? Or is this the wrong way to use the function? – RFM_Euan Dec 22 '15 at 11:28
  • I implemented print(str(fileHolder).count('A')) and it returned the correct value, I'll test it and let you know if it works. Thanks! – RFM_Euan Dec 22 '15 at 11:36
  • Oh yes of course this would be the simplest solution – Stefan Reinhardt Dec 22 '15 at 12:21