1

I have a script that parses Excel files all together from one directory. It joins all of the files together and concatenates them into one.

Right now the way I write CSV files from a dataframe by starting an empty list then appending the scraped data from the function cutpaste which parses the data I want from each file and into a new dataframe which then writes a final concatenated CSV file.

files is the variable that calls all the Excel files from a given directory.

# Create new CSV file
df_list = []

for file in files:
    df = pd.read_excel(io=file, sheet_name=sheet)     
    new_file = cutpaste(df)
    df_list.append(new_file)

df_final = pd.concat(df_list)
df_final.to_csv('Energy.csv', header=True, index=False)

What I need now is a way of changing my code so that I can write any new Excel files that don't already exist in Energy.csv to Energy.csv.

HelloToEarth
  • 2,027
  • 3
  • 22
  • 48
  • What is it you are asking? Besides .. your proposed solution is just nonsense code. If you just want to append data to a CSV file you can just open that file in append mode and add the necessary data to it (see https://stackoverflow.com/a/17531025/826983). Further: Correct your code indentation or people might tend to downvote your question to penalize laziness. – Stefan Falk Nov 02 '18 at 13:53
  • I have changed the nonsensical code and the description to what I'm looking for. Does this help in understanding what I need? – HelloToEarth Nov 02 '18 at 14:08
  • Yes, it is clearer now except things like "*`files` is the variable that calls ..*" even though it's just a `list` - `files` is not calling anything. Anyway: The problem with your question now is that it seems you want us to provide you a solution for your problem. Besides the fact that the problem is underspecified, and by that I mean nobody knows how to determine what data you got and how we could possibly detect whether new data has to be added or not, you are also not showing a concrete code example which shows your effort and what you've tried so far. – Stefan Falk Nov 02 '18 at 14:12
  • From what I got: You'll have to load a new file (`df_new`), after detecting it somehow. Then you'll have to load that file as well as `Energy.csv` (`df`). You might wan to chunk-read `Energy.csv` and check whether any data from that chunk can be found in `df_new`. Remove any data that matches for each chunk in `df_new`. Whatever remains in `df_new` is *actually* new data and has to be appended to `df`. – Stefan Falk Nov 02 '18 at 14:20

0 Answers0