1

I have two questions regarding writing dataframe data to a file:

My program produces summary statistics on many grouped rows of a dataframe and save those to a StringIO buffer which writes to my output.csv file at completion. I have a feeling the pd.concat would be better suited but I couldn't get that to work. I can try adding a snippet of code when I get a chance and hopefully someone can explain how to properly concat and I assume that will solve my issue.

That being said, my program works and that's more than I can ask for. What is bugging me though is how the CSV file ends up repeating the same column labels for every summary statistic dataframe that was written to the buffer and incidentally into my CSV file as well. Is there a way to only write the column labels once and avoid multiple duplicate label rows?

My second question is in regards to writing to Excel to skip an unnecessary copy and paste. Like my previous issue, this is only a minor hindrance but still bugs me as I would like to do things the right way. The issue is that i want all the frames written to the same sheet. In order to avoid overwriting the same data it is necessary to use a buffer to store the data until the end. None of the docs seemed to helpful in my particular situation. I devised a workaround: xlwt to buffer -> output.write(buffer.getvalue()) -> pd.to_csv(output) and then reimport that same file via pd.read_csv and finally add another writer that writes the dataframe to Excel. After all that work I ended just sticking with the simplicity of CSV since the Excel writer actually magnified the ugliness of the duplicating rows. Any suggestions on how my buffer issue can be handled better as I would prefer the streamline and control of Excel writer to a CSV output.

Sorry for not having any code for context. I tired my best to explain without it. If necessary I can add the code when I get a free chance.

Maxim
  • 725
  • 1
  • 8
  • 24
  • For your second question please see: http://stackoverflow.com/questions/32957441/putting-many-python-pandas-dataframes-to-one-excel-worksheet – mechanical_meat Apr 01 '17 at 22:49
  • That won't work for me since I'd need to initialize a blank dataframe first, something that I can't do. – Maxim Apr 01 '17 at 22:57
  • @Maksim Your question is quite general, please include code so that people can help you out. – splinter Apr 02 '17 at 09:23

1 Answers1

3

I'd agree that concatenating the dataframes is probably a better solution. You should probably ask a question specifically for that with some sample codes/dataframes.

For your second question you can position a dataframe in an Excel worksheet using the startrow and startcol parameters. You can skip the repeated header using the header boolean parameter, and you can skip the index using the index boolean parameter.

For example:

import pandas as pd


# Create some Pandas dataframes from some data.
df1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
df2 = pd.DataFrame({'Data': [21, 22, 23, 24]})
df3 = pd.DataFrame({'Data': [31, 32, 33, 34]})
df4 = pd.DataFrame({'Data': [41, 42, 43, 44]})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_test.xlsx', engine='xlsxwriter')

# Add the first dataframe to the worksheet.
df1.to_excel(writer, sheet_name='Sheet1', index=False)

offset = len(df1) + 1  # Add extra row for column header.

# Add the other dataframes.
for df in (df2, df3, df4):
    # Write the datafram without a column header or index.
    df.to_excel(writer, sheet_name='Sheet1', startrow=offset,
                header=False, index=False)

    offset += len(df)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

Output:

enter image description here

jmcnamara
  • 38,196
  • 6
  • 90
  • 108
  • Thank you! The offset line is simple but brilliant. – Maxim Apr 02 '17 at 15:33
  • Here is a my question to the first part with code example: http://stackoverflow.com/questions/43170601/how-to-use-pd-concat-with-an-un-initiated-dataframe – Maxim Apr 02 '17 at 16:22