0

Been testing with some data management techniques with pandas and csv. What I'm trying to do is read a csv file, add some extra rows to it, and save again with identical format.

I've created a dataframe of shape (250, 20) with random values, dates as index and alphabets as columns then saved it as a csv file. Ultimately what I've tried is to append the same dataframe below an existing csv file.

def _writeBulk(savefile, data):
    df = data.reset_index()
    with open(savefile, 'w', newline='') as outfile:
        writer = csv.writer(outfile)
        writer.writerows(df.to_numpy().tolist())
        outfile.close()

def _writeData(savefile, data):
    df = data.reset_index()
    with open(savefile, 'w', newline='') as outfile:
        writer = csv.writer(outfile)
        for row in range(df.shape[0]):
            writer.writerow(df.iloc[row].tolist())
        outfile.close()

When reading in the file again after edit, the result I'm expecting is a dataframe of shape (500,20). But it seems that the file does not have headers(columns) anymore, with shape (499, 200).

enter image description here

I've searched for solutions or explanations but skipping the header while writing rows is the closest I've ever got to the actual issue.

Skip the headers when editing a csv file using Python

Any explanations or solutions would be appreciated.

Ji-Mu Kang
  • 23
  • 5
  • 1
    1. Headers: Either add writer.writeheaders or use csv.DictWriter. 2. "I've tried is to append the same dataframe below an existing csv file." - 'w' mode will overwrite your whole file, not append. Any existing headers and all existing rows will be lost – h4z3 Oct 06 '21 at 13:43
  • Cnt believed I haven't noticed that... thank you very much. I'll try the 'a' mode instead. – Ji-Mu Kang Oct 06 '21 at 13:48

1 Answers1

0

Firstly, if your .csv has the first column as date (also the index), when you read that file into a DataFrame you don't have to use .reset_index()

For example, you csv might look like this in Excel: enter image description here

When you read that to a dataframe it becomes:

enter image description here

If you simply want to append a new DataFrame with the same number of columns Date, A,B,C... you can simply do so:

source_df = pd.read_csv('initial.csv')

# copy of the same dataframe
# it could be a different df as per your requirement
new_df = source_df.copy()

# appending the 2nd dataframe to the 1st
final_df = source_df.append(new_df)

# writing to .csv
final_df.to_csv('final.csv', index=False)

We're setting the index=False to avoid DateFrame default indexing (0,1,2,3...) to be written to the final .csv file

Your final df is going to look like:

enter image description here

And when viewed in Excel:

enter image description here

Having said all this, if you want your DataFrame to have date as the index, but when you write to .csv, it should have columns data, A,B,C,...

Do this:

source_df = pd.read_csv('test.csv')

# copy of the same dataframe
new_df = source_df.copy()

# appending the 2nd dataframe to the 1st
final_df = source_df.append(new_df)

final_df.set_index(['date'], inplace=True)
final_df.to_csv('final.csv')

And you'll have, df:

enter image description here

And when viewed in Excel:

enter image description here

Shine J
  • 798
  • 1
  • 6
  • 11
  • Thanks for your contribution. The data parameter is the dataframe that is already in memory, with dates as index. So before writing rows, I had to reset the index for list conversion. I've chosen this method since the data I'm supposed to actually work with has 1e8 rows & I don't think reading in the whole thing and concatenating it will be a good idea. Do you have any sort of suggestions that effectively concatenates such large datasets? – Ji-Mu Kang Oct 06 '21 at 16:35