-3

I'm trying to write a script in Python to read the last element (bottom right) of .csv files (File001-..-File010) inside N folders (Folder001-..-Folder006) and make some operations (total 10*6 = 60 .csv files). The .csv files have number of rows variable.

My idea for the script:

  • N is the number of folders and P is the number of .csv files inside each folders;
  • Enter the folder 1, enter the P .csv files only to read their last element (bottom right) and write it down in a list (of P elements);
  • Sum all the elements in this list and write the result in the list output (of N elements)
  • Do the same for folder 2 etc..

I would need some help to read the .csv file and its last element within the loop. I read many posts but I am not able to apply them unfortunately.

N = 6
P = 10

def calculate_output(N, P):
    output = []    
    for i in range(N):        
        for j in range(P):    
            prob = []    
            if FILE NAMES ENDS WITH (".csv") in "./Folder00"+str(i+1):    
                prob.append(BOTTOM RIGHT ELEMENT OF THE FILE)    
        output.append(sum(prob[p] for p in range(P)))    
    return output 
tripleee
  • 175,061
  • 34
  • 275
  • 318

1 Answers1

0

I'm afraid your question isn't very clear, but I guess you want something like

import os

N = 6
# P = 10  # ????

def calculate_output(N, P):
    output = []
    for i in range(N):
        dirname = "./Folder00" + str(i+1)
        for filename in os.listdir(dirname):
            probsum = 0
            if filename.endswith(".csv"):
                with open(os.path.join(dirname, filename) as csv:
                    for line in csv:
                        pass
                    # line now contains last line
                    probsum += int(line.rstrip('\n').split(',')[-1])
        output.append(probsum)
    return output

If you have 10 CSV files in each folder then you don't really need the parameter P for anything; but I'm not entirely sure I guessed correctly what your code is supposed to do here. The above simply takes the last comma-separated field from the last line in each file, and converts from a string to a number. The function returns a list of the sums of the numbers from each folder.

If the files are huge, maybe look into optimizing the logic for fetching the last line. If you know or can reasonably guess how long the last line can be, seek back from the end of the file that many bytes; see e.g. Get last n lines of a file with Python, similar to tail

If the CSV format has complications like quoted fields, use csvreader instead of attempting to simply split on comma.

tripleee
  • 175,061
  • 34
  • 275
  • 318