1

I am novice in python. I have below dataframe.I want to pivot\transpose some of the columns and saved the dataframe into individual dataframes based on the column 2.

data={'col1':[1,101,201,301,2,102,202,302,3,103,203,303], 
                    'col2':[1,1,1,1,2,2,2,2,3,3,3,3],
                    'col3':["2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15",
                                "2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15","2015-01-15"],
                    'col4':["2015-01-15","2015-01-16","2015-01-17","2015-01-18","2015-01-15","2015-01-16",
                                "2015-01-17","2015-01-18","2015-01-15","2015-01-16","2015-01-17",
                                "2015-01-18"], 
                    'col5':[0,1,2,3,0,1,2,3,0,1,2,3],
                    'col6':[273.2,275.9,343,235,273.2,275.9,343,235,273.2,275.9,343,235],
                    'col7':[2.8,3.2,7.9,7.2,2.8,3.2,7.9,7.2,2.8,3.2,7.9,7.2]}
df= pd.DataFrame(data)
print(df)

    col1  col2        col3        col4  col5   col6  col7
0      1     1  2015-01-15  2015-01-15     0  273.2   2.8
1    101     1  2015-01-15  2015-01-16     1  275.9   3.2
2    201     1  2015-01-15  2015-01-17     2  343.0   7.9
3    301     1  2015-01-15  2015-01-18     3  235.0   7.2
4      2     2  2015-01-15  2015-01-15     0  273.2   2.8
5    102     2  2015-01-15  2015-01-16     1  275.9   3.2
6    202     2  2015-01-15  2015-01-17     2  343.0   7.9
7    302     2  2015-01-15  2015-01-18     3  235.0   7.2
8      3     3  2015-01-15  2015-01-15     0  273.2   2.8
9    103     3  2015-01-15  2015-01-16     1  275.9   3.2
10   203     3  2015-01-15  2015-01-17     2  343.0   7.9
11   303     3  2015-01-15  2015-01-18     3  235.0   7.2

So,I want to break the above dataframe in the below format. Saved the dataframe name based on column2 name. For example, col2 =1 so dataframe name df[1], col=2: df[2]

df['1']
col2 col3            1_0_col6  101_1_col6  201_1_col6 301_1_col6 1_0_col7  101_1_col7  201_1_col7 301_1_col7
1    2015-01-15        273.2        275.9       343.0     235.0      2.8      3.2          7.9         7.2       



    df['2']
    col2 col3        1_0_col6  101_1_col6  201_1_col6 301_1_col6 1_0_col7  101_1_col7  201_1_col7 301_1_col7
    2    2015-01-15      273.2        275.9       343.0     235.0      2.8      3.2          7.9         7.2    

    df['3']
    col2 col3          1_0_col6  101_1_col6  201_1_col6 301_1_col6 1_0_col7  101_1_col7  201_1_col7 301_1_col7
    2    2015-01-15    273.2        275.9       343.0     235.0      2.8      3.2          7.9         7.2       
rnvs1116
  • 39
  • 2
  • 12

1 Answers1

1

IIUC, I think you want something like this. However, I don't quite understand your column naming in your results.

df_out = df.rename_axis('colnum',1).set_index(['col1','col2','col3','col4','col5'])\
           .stack()\
           .groupby(['col1','col5','colnum','col2','col3'])\
           .apply(lambda x: pd.Series(x.values.tolist())).unstack([0,1,2])\
           .reset_index(level=-1, drop=True)

df_out.columns = df_out.columns.map('{0[0]}_{0[1]}_{0[2]}'.format)

Print df[1] where col2 = 1

df_out.loc[[1]].dropna(1).reset_index()

Output:

   col2        col3  1_0_col6  1_0_col7  101_1_col6  101_1_col7  201_2_col6  201_2_col7  301_3_col6  301_3_col7
0     1  2015-01-15     273.2       2.8       275.9         3.2       343.0         7.9       235.0         7.2

Print df[2] where col2 = 2

df_out.loc[[2]].dropna(1).reset_index()

Output:

   col2        col3  2_0_col6  2_0_col7  102_1_col6  102_1_col7  202_2_col6  202_2_col7  302_3_col6  302_3_col7
0     2  2015-01-15     273.2       2.8       275.9         3.2       343.0         7.9       235.0         7.2

Print df[2] where col2 = 3

df_out.loc[[3]].dropna(1).reset_index()

Output:

   col2        col3  3_0_col6  3_0_col7  103_1_col6  103_1_col7  203_2_col6  203_2_col7  303_3_col6  303_3_col7
0     3  2015-01-15     273.2       2.8       275.9         3.2       343.0         7.9       235.0         7.2
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Sorry my bad I didn't give column properly. It should be `col2_col5_col6`. And one more thing, col2 contains values 1 through 100. So, passing into map function all the 100 values in order to store into dataframe is not feasible. Can we write function to store all the data into separate dataframe df[1],df[2],....,df[100]. – rnvs1116 Jan 15 '18 at 16:46
  • I think there is `melt` function in pandas lib which can perform the same. – rnvs1116 Jan 15 '18 at 16:50
  • `df_dict = {}` `for i in df_out.index.get_level_values(0): df_dict[i] = df_out.loc[[i]].dropna(1).reset_index()` – Scott Boston Jan 15 '18 at 16:52
  • Store output in a dictionary of dataframes, so you can access each dataframe using df_dict[1], df_dict[2], etc... – Scott Boston Jan 15 '18 at 16:53
  • You are trying to melt two separate columns, you can look at the [`pd.wide_to_long`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.wide_to_long.html#pandas-wide-to-long) funciton. – Scott Boston Jan 15 '18 at 16:54
  • Thanks @ScottBoston, Could you please explain what does `df_out.columns.map('{0[0]}_{0[1]}_{0[2]}'.format)` doing? – rnvs1116 Jan 15 '18 at 17:32
  • We are talking a three level multiindex column and flattening it into a single level with '_' as separators. See this https://stackoverflow.com/q/46163098/6361531 – Scott Boston Jan 15 '18 at 17:33
  • one quick question, I have one more column (col7-time col in the below link) I want to transpose as well. I posted that question https://stackoverflow.com/questions/48307447/transpose-one-row-to-column-keeping-other-as-it-is-in-python/48307687?noredirect=1#comment83601268_48307687 but not getting desired output. – rnvs1116 Jan 17 '18 at 18:50
  • @mvs1116 I see you deleted the question, did you get what you needed? – Scott Boston Jan 17 '18 at 22:47
  • 1
    yes, I was doing the silly mistake. Your code worked for the same. Thanks Scott!! – rnvs1116 Jan 18 '18 at 14:04
  • I have one quick question, this code is working fine when I am applying this on small data but if it is two month data it is containing multiple rows for a single date. For example, in my real example, from 2015-01-25 onwards, result contains multiple rows. And I am not able to find it why it is so. Any idea\comments on this? – rnvs1116 Jan 18 '18 at 19:36