1

I would like to read a lot of data in a folder, and want to delete lines that have "DT=(SINGLE SINGLE SINGLE)", and then write it as new data. In that Data folder, there are 300 data files!

My code is

import os, sys
path = "/Users/xxx/Data/"

allFiles = os.listdir(path)

for fname in allFiles:
    print(fname)

    with open(fname, "r") as f:
        with open(fname, "w") as w:
            for line in f:
                if "DT=(SINGLE SINGLE SINGLE)" not in line:
                    w.write(line)
FileNotFoundError: [Errno 2] No such file or directory: '1147.dat'

I want to do it for a bunch of dataset. How can I automatically read and write to delete the lines? and is there way to make a new dataset with a different name? e.g. 1147.dat -> 1147_new.dat

Massifox
  • 4,369
  • 11
  • 31
mario119
  • 343
  • 3
  • 14
  • You might need to read all the contents of the file, before doing `open(x, 'w')` because write mode will wipe the file. so, `data = f.read(); for line in data: ...`. – Torxed Sep 23 '19 at 22:07
  • You just want to delete the line that contains `"DT=(SINGLE SINGLE SINGLE)"` and leave everything else the same, correct? – OverLordGoldDragon Sep 23 '19 at 22:43
  • @Torxed Not working.. The error message is coming from with open(fname,"r") as f: not just with open(fname,'w') as w: – mario119 Sep 23 '19 at 22:45
  • @OverLordGoldDragon Correct, but I have hundreds of files that has same data format, but different values. How can I handle that much data files? with open("1174.dat", "r") as f: with open("1174_new.dat", "w") as w: for line in f: if "DT=(SINGLE SINGLE SINGLE)" not in line: w.write(line) This code was working for a data file though.. – mario119 Sep 23 '19 at 22:48
  • Yea you need the full path, since you're not standing in `/Users/xxx/Data/` you need to do `with open(path + fname, 'r')`. The error message is quite self explanatory. – Torxed Sep 23 '19 at 22:50
  • `listdir` gives only filename without path so you have to join `path` with `fname` to have correct path to file - ie. `fullpath = os.path.join(path, fname)` – furas Sep 23 '19 at 23:04
  • @furas That helped! I think os.path.join is good to use! – mario119 Sep 23 '19 at 23:19
  • @OverLordGoldDragon Fair point, I suggest you use the "Flag" feature to gain attention from moderators with the ability to oppose / re-open questions you feel aren't properly marked as duplicate. I don't dare to re-open based on your comment with the risk of getting backlash from re-opening something that shouldn't. The main problem here is that it's a XY problem, the solution is sort of there - and instead of asking why the error occurs OP asks how to do the whole thing. So me personally still think this is a duplicate. And your answer below is still valid and good as it stands :) – Torxed Sep 24 '19 at 18:18
  • @Torxed Didn't know flags can be used to this end - thanks for the suggestion – OverLordGoldDragon Sep 24 '19 at 20:45

1 Answers1

1

The below should do; code demos of what each annotated line does afterwards:

path = "/Users/xxx/Data/"
allFiles = [os.path.join(path, filename) for filename in os.listdir(path)] # [1]
del_keystring = "DT=(SINGLE SINGLE SINGLE)" # general case

for filepath in allFiles: # better longer var names for clarity
    print(filepath)

    with open(filepath,'r') as f_read: # [2]
        loaded_txt = f_read.readlines()
    new_txt = []
    for line in loaded_txt:
        if del_keystring not in line:
            new_txt.append(line)
    with open(filepath,'w') as f_write: # [2]
        f_write.write(''.join([line for line in new_txt])) # [4]

    with open(filepath,'r') as f_read: # [5]
        assert(len(f_read.readlines()) <= len(loaded_txt))
  • 1 os.listdir returns only the filenames, not the filepaths; os.path.join joins its inputs into a fullpath, with separators (e.g. \\): folderpath + '\\' + filename
  • [2] NOT same as doing with open(X,'r') as .., with open(X,'w') as ..:; the as 'w' empties the file, thus nothing for as 'r' to read
  • [3] If f_read.read() == "Abc\nDe\n12", then f_read.read().split('\n')==["Abc,"De","12"]
  • [4] Undoes [3]: if _ls==["a","bc","12"], then "\n".join([x for x in _ls])=="a\nbc\n12"
  • [5] Optional code to verify that saved file's # of lines is <= original file's
  • NOTE: you may see the saved filesize slightly bigger than original's, which may be due to original's better packing, compression, etc - which you can figure from its docs; [5] ensures it isn't due to more lines

# bonus code to explicitly verify intended lines were deleted
with open(original_file_path,'r') as txt:
    print(''.join(txt.readlines()[:80])) # select small excerpt
with open(processed_file_path,'r') as txt:
    print(''.join(txt.readlines()[:80])) # select small excerpt
# ''.join() since .readlines() returns a list, delimited by \n


NOTE: for more advanced caveats, see comments below answer; for a more compact alternative, see Torxed's version
OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/199937/discussion-on-answer-by-overlordgolddragon-how-can-i-automatize-reading-and-writ). – Samuel Liew Sep 25 '19 at 04:37