Looping through n number of CSV files and deleting columns in python

Question

I have a program that I need to add a functionality to, which is to strip out the second column of each Event CSV file which it processes. I've tried the solutions at this thread, but I've been unsuccessful in employing any of them.

My CSV files look like this

Time/Date,Event #,Event Desc
05/19/2020 20:12:30,29,Advance Drive ON
05/19/2020 20:32:23,29,Advance Drive ON
05/19/2020 20:35:13,29,Advance Drive ON
05/19/2020 20:39:50,37,Discharge 1 Plug Chute Fault
05/19/2020 20:47:40,68,LMI is in OFF Mode

And here is my function:

# A function to clean the Event Files of raw data
def CleanEventFiles(EF_files, eventHeader, EFmachineID):
    logging.debug(f'Cleaning Event files...')                       # Write to program logger
    for f in EF_files:                                              # FOR ALL FILES IN EVENT FILES
        IsFileReadOnly(f)                                           # check to see if the file is READ ONLY
        print(f'\nCleaning file: {f}')                              # tell user which file is being cleaned
        print('\tReplacing new MachineIDs & File Headers...')       # print stuff to the user
        logging.debug(f'\tReplacing headers for file {f}')          # write to program logger
        with open(f, newline='', encoding='latin-1') as g:          # open file as read
            r = csv.reader((line.replace('\0', '') for line in g))  # declare read variable while removing NULLs
            next(r)                                                 # remove old machineID
            data = [line for line in r]                             # set list to all data in file
            data[0] = eventHeader                                   # replace first line with new header
            data.insert(0, EFmachineID)                             # add line before header for machine ID
        WriteData(f, data)                                          # write data to the file

I know it's got to be something as simple as putting del r[1] into a loop somewhere, but for the life of me I can't seem to figure it out. The best I seem to get is to remove the Event # header on each file, but the data in data[1] remains after the file processes.

What would be the best way to go about removing the second column of data from these files?

Does this answer your question? [Delete or remove last column in CSV file using Python](https://stackoverflow.com/questions/7245738/delete-or-remove-last-column-in-csv-file-using-python) — Woodford, Feb 23 '23 at 16:40

score 0 · Accepted Answer · answered Feb 23 '23 at 17:03

If you can read all the rows into a list via csv.DictReader then that is a fairly straightforward solution. Note, this answer does things one file at a time so you will want to call it for each file. Note as well that this is currently destructive in that the original file is overwritten.

import csv

def clean_event_file(filename, column_to_remove):

    ##--------------------
    ## read in all the rows at once.
    ## note that this will also get us the headers.
    ##--------------------
    with open(filename, "r") as file_in:
        rows = list(csv.DictReader(file_in))
    headers = [col for col in rows[0].keys() if col != column_to_remove]
    ##--------------------

    ##--------------------
    ## Write out the results again absent the given header
    ##--------------------
    with open(filename, "w", newline="") as file_out:
        writer = csv.DictWriter(file_out, fieldnames=headers, extrasaction="ignore")
        writer.writeheader()
        writer.writerows(rows)
    ##--------------------

clean_event_file("in.csv", "Event #")

Looping through n number of CSV files and deleting columns in python

1 Answers1