1

I have one giant csv file that contains a year worth intraday data of a stock, as it is too big to analyse, I would like to process the data beforehand.

I want to break it down into 12 excel files (12 months), each one would have 28-31 sheets/tabs/pages that contains that intraday data for that date. Hence I have prepared these excel files with names Stock-01, Stock-02 until Stock-12, they are empty at the moment, waiting for data to append by my python code.

I am using the code below after I use read_csv:

for index in indexs:
    name = "Stock_" + index.strftime("%y-%m-%d")

    work = data[data.index.date == index].copy()
    columnsTitles = ['Volume','Open','High','Low','Close']
    work = work.reindex(columns= columnsTitles)

    filepath = "Stock-{}.xlsx".format(index.strftime("%m"))
    writer = ExcelWriter(filepath, engine='openpyxl')
    work.to_excel(writer, index = True, sheet_name=name)
    writer.save()
    writer.close()

    del work

My approach is, indexs contains all the dates, hence running through using index, i select the prepared excel file, append a sheet with the name using the date (index), and copy the dataframe directly using the simple to_excel function.

However, the result is, it has only one sheet, and this sheet has the last date data. (One possible reason may be the program is creating a new workbook everytime rather than reading the existing workbook)

A similar question is asked, but that is working on another approach. I would like to write a dataframe directly to excel, but in this question, using xlsxwriter, I don't seem to be able to do it, as it accepts str value only.

Victor
  • 659
  • 3
  • 8
  • 19
  • can you edit your code with correct tabs? – Biarys Mar 07 '19 at 23:16
  • done, thanks for correcting me – Victor Mar 07 '19 at 23:17
  • the reason you have only one tab is because thru each iteration you create a new file called "Stock-{}.xlsx" with one tab in it. Depending on your code, you will either end up with multiple excel sheets, or one sheet that gets rewritten multiple times. But I think there's a deeper problem with your code – Biarys Mar 07 '19 at 23:20
  • also, looks like a duplicate of https://stackoverflow.com/questions/37522394/python-xlsxwriter-write-to-many-sheets – Biarys Mar 07 '19 at 23:25
  • Possible duplicate of [Python XlsxWriter - Write to many sheets](https://stackoverflow.com/questions/37522394/python-xlsxwriter-write-to-many-sheets) – Biarys Mar 07 '19 at 23:26
  • i see, that makes sense, so how can I read my existing file and use pandas to write the dataframe into it? and what is the deeper problem of my code please? – Victor Mar 07 '19 at 23:27
  • I think it is different, at least is different approach, cant write dataframe using xlsxwriter – Victor Mar 07 '19 at 23:39
  • Have a look at my answer and let me know if it works – Biarys Mar 07 '19 at 23:40

2 Answers2

1

The reason you have only one tab is because thru each iteration you create a new file called "Stock-{}.xlsx" with one tab in it. Depending on your code, you will either end up with multiple excel sheets, or one sheet that gets rewritten multiple times.

Edited after our discussion:

df = pd.read_csv("D:/AmiBackupeSignal/AMGN.txt", index_col="Date/Time")

df.index = pd.to_datetime(df.index)

# get list of all unique days/months 
days = df.index.day.unique().sort_values()
months = df.index.month.unique().sort_values()

for month in months:
    filepath = "Stock-{}.xlsx".format(month)
    with pd.ExcelWriter(filepath) as writer:
        for day in days:
            name = "Stock_" + "-".join([str(df.index.year.unique()[0]), str(month), str(day)])

            # get data for that month/day
            # this will create 31 tab
            temp = df.loc[(df.index.month == month) & (df.index.day == day)]
            temp.to_excel(writer, index = True, sheet_name=name)

        writer.save()
Biarys
  • 1,065
  • 1
  • 10
  • 22
  • filepath = "Stock-{}.xlsx".format(index.strftime("%m")) in this statement, index is generated from the for loop, so it can not be written outside. I have tried using with Excelwritter within the for loop, and it doesnt work, still creating a new file rather than reading my existing file – Victor Mar 07 '19 at 23:45
  • right, didnt notice it. so for each month you want to create a new file? – Biarys Mar 07 '19 at 23:48
  • basically, I have already created the empty spreadsheet as a container, then when I run through my historical data, i extract the date, find out what month it belongs to, open the corresponding spreadsheet, wanna write the entire dataframe into a new sheet in this workbook (this is where I failed to do so, save and close the workbook, be ready for the next iteration – Victor Mar 07 '19 at 23:51
  • so if you have 2 months of january, in the second iteration it needs to open up an existing sheet and modify it? That's quite problematic as far as I know – Biarys Mar 07 '19 at 23:53
  • yes, so let say i have written 1st Jan in the Jan book, then when I run into 2nd Jan, it should open the Jan book which has 1st Jan page created, then create the page of 2nd Jan in the book. So just keep appending the excel with sheet – Victor Mar 07 '19 at 23:56
  • so 1 excel sheet with 6 tabs? – Biarys Mar 08 '19 at 00:00
  • yes, i may not be using the same terms as you, but i get what you mean. Workbook = excel file, Sheet=tab=page – Victor Mar 08 '19 at 00:05
  • Then why excel file name changes month to month? ```filepath = "Stock-{}.xlsx".format(index.strftime("%m"))``` will result in a new file each time. If you wanna keep one file, I recommend you use ```filepath = "Stock.xlsx"``` – Biarys Mar 08 '19 at 00:09
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/189634/discussion-between-biarys-and-victor). – Biarys Mar 08 '19 at 00:10
0

What you can do is create blank excel file before for loop. Once that is excel book is there you can load it using

from openpyxl import load_workbook

book = load_workbook('CHANGE_THE_NAME.xlsx')

Then for each writer object in loop attach this book to it like this

writer = pandas.ExcelWriter('CHANGE_THE_NAME.xlsx', engine='openpyxl') 
writer.book = book

After that, you can use to_excel to write new sheets in same file. I hope this answers your question.

Aditya
  • 352
  • 1
  • 12
  • why? if I have a year worth data, does it mean i have to create 365 df and be ready to write into a sheet? – Victor Mar 07 '19 at 23:20