0

I have a text file with all of them currently having the same end character (N), which is being used to identify progress the system makes. I want to change the end character to "Y" in case the program ends via an error or other interruptions so that upon restarting the program will search until a line has the end character "N" and begin working from there. Below is my code as well as a sample from the text file.

UPDATED CODE:

def GeoCode():
    f = open("geocodeLongLat.txt", "a")
    with open("CstoGC.txt",'r') as file:
        print("Geocoding...")
        new_lines = []
        for line in file.readlines():
            check = line.split('~')
            print(check)
            if 'N' in check[-1]:
                geolocator = Nominatim()
                dot_number, entry_name, PHY_STREET,PHY_CITY,PHY_STATE,PHY_ZIP = check[0],check[1],check[2],check[3],check[4],check[5] 
                address = PHY_STREET + " " + PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
                f.write(dot_number + '\n')
                try:
                    location = geolocator.geocode(address)
                    f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
                except AttributeError:
                    try:
                        address = PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
                        location = geolocator.geocode(address)
                        f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
                    except AttributeError:
                        print("Cannot Geocode")
            check[-1] = check[-1].replace('N','Y')
        new_lines.append('~'.join(check))

    with open('CstoGC.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
        for line in new_lines:
            file.writelines(line)        

    f.close()

Output:

2967377~DARIN COLE~22112 TWP RD 209~ALVADA~OH~44802~Y
WAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
143608~LARRY A PETERSON & DONNA M PETERSON~W6359 450TH AVE~ELLSWORTH~WI~54011~N
635528~JAMES E WEBB~3926 GREEN ROAD~SPRINGFIELD~TN~37172~N
805496~WAYNE MLADY~22272 135TH ST~CRESCO~IA~52136~N
704996~SAVINA C MUNIZ~814 W LA QUINTA DR~PHARR~TX~78577~N
893169~BINDEWALD MAINTENANCE INC~213 CAMDEN DR~SLIDELL~LA~70459~N
948130~LOGISTICIZE LTD~861 E PERRY ST~PAULDING~OH~45879~N
438760~SMOOTH OPERATORS INC~W8861 CREEK ROAD~DARIEN~WI~53114~N
518872~A B C RELOCATION SERVICES INC~12 BOCKES ROAD~HUDSON~NH~03051~N
576143~E B D ENTERPRISES INC~29 ROY ROCHE DRIVE~WINNIPEG~MB~R3C 2E6~N
968264~BRIAN REDDEMANN~706 WESTGOR STREET~STORDEN~MN~56174-0220~N
721468~QUALITY LOGISTICS INC~645 LEONARD RD~DUNCAN~SC~29334~N

As you can see I am already keeping track of which line I am at just by using x. Should I use something like file.readlines()?

Sample of text document:

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N

Thank you!

Edit: updated code thanks to @idlehands

  • 2
    You could gain a lot of readability if you use str.format(). Read up on it, it's worth it :) – Anton vBR Jan 30 '18 at 20:21
  • This is a pretty broad question. It would probably be best to read up on [file methods](https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects) and try some things out. You might be looking for something like a [zip](https://docs.python.org/3.5/library/functions.html#zip) of `f.readlines()` and a range object. What you have now should work fine too. – Jakob Jan 30 '18 at 20:31
  • You cannot replace anything in a file open for appending. You cannot read it either. – Stop harming Monica Jan 30 '18 at 20:39
  • Also you are talking about one file but your code uses two. – Stop harming Monica Jan 30 '18 at 20:41
  • @Goyo, the one file just adds longitude and lattitude – Jordan Murray Jan 30 '18 at 21:12

2 Answers2

1

There are a few ways to do this.

Option #1

My original thought was to use the tell() and seek() method to go back a few steps but it quickly shows that you cannot do this conveniently when you're not opening the file in bytes and definitely not in a for loop of readlines(). You can see the reference threads here:

Is it possible to modify lines in a file in-place?
How to solve "OSError: telling position disabled by next() call"

The investigation led to this piece of code:

with open('file.txt','rb+') as file:
    line = file.readline() # initiate the loop
    while line: # continue while line is not None
        print(line)
        check = line.split(b'~')[-1]
        if check.startswith(b'N'): # carriage return is expected for each line, strip it

            # ... do stuff ... #

            file.seek(-len(check), 1) # place the buffer at the check point
            file.write(check.replace(b'N', b'Y')) # replace "N" with "Y"
        line = file.readline() # read next line

In the first referenced thread one of the answers mentioned this could lead you to potential problems, and directly modifying the bytes on the buffer while reading it is probably considered a bad idea™. A lot of pros probably will scold me for even suggesting it.

Option #2a

(if file size is not horrendously huge)

with open('file.txt','r') as file:
    new_lines = []
    for line in file.readlines():
        check = line.split('~')
        if 'N' in check[-1]:

            # ... do stuff ... #

            check[-1] = check[-1].replace('N','Y')
        new_lines.append('~'.join(check))

with open('file.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
    for line in new_lines:
        file.writelines(line)

This approach loads all the lines into memory first, so you do the modification in memory but leave the buffer alone. Then you reload the file and write the lines that were changed. The caveat is that technically you are rewriting the entire file line by line - not just the string N even though it was the only thing changed.

Option #2b

Technically you could open the file as r+ mode from the onset and then after the iterations have completed do this (still within the with block but outside of the loop):

# ... new_lines.append('~'.join(check)) #
    file.seek(0)
    for line in new_lines: 
        file.writelines(line)

I'm not sure what distinguishes this from Option #1 since you're still reading and modifying the file in the same go. If someone more proficient in IO/buffer/memory management wants to chime in please do.

The disadvantage for Option 2a/b is that you always end up storing and rewriting the lines in the file even if you are only left with a few lines that needs to be updated from 'N' to 'Y'.

Results (for all solutions):

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~Y
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~Y
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~Y
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~Y
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~Y

And if you were to say, encountered a break at the line starting with 220940, the file would become:

570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244  SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N

There are pros and cons to these approaches. Try and see which one fits your use case the best.

Community
  • 1
  • 1
r.ook
  • 13,466
  • 2
  • 22
  • 39
  • I tried doing your option 2a and it did not edit the text document at all – Jordan Murray Jan 30 '18 at 22:24
  • How are you implementing it? It works fine on my end. – r.ook Jan 30 '18 at 22:27
  • unindent your `new_lines.append('~'.join(check))` by one level. It should be outside your `if 'N' in check[-1]:` condition but within your `for` loop so that it appends the line regardless if the condition was met. – r.ook Jan 30 '18 at 22:35
  • Not sure what's going on. I get a time out error for my GeoCode (the interruption I was talking about) and the text file still is wrong. I'm thinking it might be easier to append a character to the end of each entry after it is processed/geocoded and just check for that end character each time. – Jordan Murray Jan 31 '18 at 02:06
  • This might sound like a no-brainer but you have did write `new_lines` back to your file after `f.close()`... right? Also, if your code is throwing an error (that is not handled) of course it wouldn't work since the file hasn't been closed. Double check my code to see if you haven't missed copying anything. – r.ook Jan 31 '18 at 02:37
  • Well ive gotten further, check the new code and output file. thank you so much for your help – Jordan Murray Jan 31 '18 at 03:08
  • I did see it, but I do not see you added any code to write `new_lines` back to the file after reading it which is why I asked. And you mentioned `TimeOutError` which wasn't handled, so that will break your code and interrupt the writing back, hence, nothing gets updated. – r.ook Jan 31 '18 at 03:09
  • Sorry I just updated it a second ago, also I can handle the service timed out with a try, except block. currently im having it not geocode but just write the first set of numbers to the geocodeLongLate.txt file. – Jordan Murray Jan 31 '18 at 03:10
  • Your `check[-1] = check[-1].replace('N','Y')` needs to be indented one more level (inside the `if` block) and the `new_lines.append('~'.join(check))` needs to be indented one more level as well (inside the `for` loop). **Indentation is crucial to Python syntax**, it tells the interpreter whether a line is to be executed within a code block or not. If you're still having problems that are not unhandled errors, *please* check to make sure you have the *exact indentation levels* per my answers. – r.ook Jan 31 '18 at 03:17
  • Ah yes sorry, it was hard to tell with how far away it was from the loops and if statement. Promise I've been using python for a while and definitely no pro but seem to keep making silly mistakes. It worked now thank you very much – Jordan Murray Jan 31 '18 at 03:33
  • Not a problem - the next thing you might want to do is check out [`str.format()`](https://docs.python.org/3/library/stdtypes.html#str.format) as @AntonvBR mentioned, it's worthwhile to help de-clutter your code and makes it look beautiful. Keep coding! – r.ook Jan 31 '18 at 03:38
  • I will! Not sure I fully understand it yet but I'll keep trying lol. Cheers – Jordan Murray Jan 31 '18 at 03:54
0

I would read the entire input file into a list and .pop() the lines off one at a time. In case of an error, append the popped item to the list and write overwrite the input file. This way it will always be up to date and you won't need any other logic.

Chris
  • 15,819
  • 3
  • 24
  • 37