1

I've searched through the pandas docs and unfortunately, I could not find the answer.

Essentially, after some data wrangling, I have the dataframe

    ticker_id          close_date           sector  sector_index
0           1 2014-02-28 00:00:00   Consumer Goods     31.106653
1           1 2014-02-27 00:00:00   Consumer Goods     30.951213
2           2 2014-02-28 00:00:00   Consumer Goods     19.846387
3           2 2014-02-27 00:00:00   Consumer Goods     19.671747
4           3 2014-02-28 00:00:00   Consumer Goods   1208.552000
5           3 2014-02-27 00:00:00   Consumer Goods   1193.352000
6           4 2014-02-28 00:00:00   Consumer Goods      9.893989
7           4 2014-02-27 00:00:00   Consumer Goods      9.857385
8           5 2014-02-28 00:00:00   Consumer Goods     52.196757
9           5 2014-02-27 00:00:00   Consumer Goods     53.101520
10          6 2014-02-28 00:00:00         Services      5.449554
11          6 2014-02-27 00:00:00         Services      5.440019
12          7 2014-02-28 00:00:00  Basic Materials   4149.237000
13          7 2014-02-27 00:00:00  Basic Materials   4130.704000

And I ran groupby

df_all2 = df_all.groupby(['close_date','sector']).sum()
print df_all2

And the outcome is this

                            ticker_id  sector_index
close_date sector                                  
2014-02-27 Basic Materials          7   4130.704000
           Consumer Goods          15   1306.933865
           Services                 6      5.440019
2014-02-28 Basic Materials          7   4149.237000
           Consumer Goods          15   1321.595786
           Services                 6      5.449554

But in this form, I can't upload to mysql properly. So in order to upload to mysql properly, I need to do this and a few other things.

data2 = list(tuple(x) for x in df_all2.values)

but data2 has meaningless garbage.

To make a long story short, how can I get groupby to give me the following outcome (where the close_date are all filled in properly and the column headings are tabular).

close_date sector           ticker_id  sector_index
2014-02-27 Basic Materials          7   4130.704000
2014-02-27 Consumer Goods          15   1306.933865
2014-02-27 Services                 6      5.440019
2014-02-28 Basic Materials          7   4149.237000
2014-02-28 Consumer Goods          15   1321.595786
2014-02-28 Services                 6      5.449554

Also, to help the community, how should I modify the title so that other pandas users facing this issue can find your solution, too? I really appreciate your help.

vt2424253
  • 1,387
  • 4
  • 25
  • 39

1 Answers1

2

You have to reset_index on a MultiIndex before using to_sql*:

In [11]: df.groupby(['close_date','sector']).sum().reset_index()
Out[11]:
  close_date           sector  ticker_id  sector_index
0 2014-02-27  Basic Materials          7   4130.704000
1 2014-02-27   Consumer Goods         15   1306.933865
2 2014-02-27         Services          6      5.440019
3 2014-02-28  Basic Materials          7   4149.237000
4 2014-02-28   Consumer Goods         15   1321.595786
5 2014-02-28         Services          6      5.449554

Alternatively you can use as_index=False in the groupby:

In [12]: df.groupby(['close_date','sector'], as_index=False).sum()
Out[12]:
  close_date           sector  ticker_id  sector_index
0 2014-02-27  Basic Materials          7   4130.704000
1 2014-02-27   Consumer Goods         15   1306.933865
2 2014-02-27         Services          6      5.440019
3 2014-02-28  Basic Materials          7   4149.237000
4 2014-02-28   Consumer Goods         15   1321.595786
5 2014-02-28         Services          6      5.449554

*Note: this should be fixed from 0.14 onwards i.e. you should be able to save a MultiIndex to sql.

See How to insert pandas dataframe via mysqldb into database?.

Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Thanks so much for the answer and the link to "how to insert pandas to mysqldb". I wasn't able to get it to work so I am using pymysql package. Do you know if pandas work with pymysql? – vt2424253 Mar 13 '14 at 01:09
  • @vt2424253 I think some users here have said that it does. – Andy Hayden Mar 13 '14 at 01:09