Pandas - How to convert row data to columns

Question

I want to groupby my data using a column (No) and keep each result of the columns date1 and results in different columns.

Here is an example of an input with the corresponding expected output :

enter image description here

I've added a little more data. and There's a lot of data.

Please present your data as data and not as an image. see: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Itamar Mushkin, Jul 29 '19 at 08:01
There is more data available,and fixed code....... df = pd.DataFrame({'No.' : df['date_1'], 'date_1' : [datetime.now() for x in range(3)], 'results' : df['results']}) — tomy, Jul 29 '19 at 09:20
Then add that more data and fixed code to the question body, not in a comment — Itamar Mushkin, Jul 29 '19 at 09:26
...Why would a question be put on hold as unclear after two users already understood and answered it? — Itamar Mushkin, Jul 29 '19 at 10:06

vlemaistre · Accepted Answer · 2019-07-29T09:15:10.057

2

Here is a way to do it :

from datetime import datetime

df = pd.DataFrame({'No.' : ['s1', 's2', 's2'], 'date_1' : [datetime.now() for x in range(3)],
                  'results' : [1.2, 9.73, 3.71]})

# Use groupby to get the lists of dates and result
result = df.groupby('No.')[['date_1', 'results']].agg({'date_1' : list, 'results' : list})
# if you are running a pandas version <0.24.2 uncomment the following line and comment the one above
#result = df.groupby('No.')[['date_1', 'results']].agg({'date_1' : lambda x: list(x), 'results' : lambda x: list(x)})

# Look at the number of columns we will have to create
len_max = np.max([len(x) for x in result['results']])

# Create all the required columns  
for i in range(1,len_max):
    result['date__{}'.format(i+1)] = [x[i] if len(x)>i else 0 for x in result['date_1']]
    result['results_{}'.format(i+1)] = [x[i] if len(x)>i else 0 for x in result['results']]

# Modify the first  two columns that still contain the lists of the groupby
result['date_1'] = [x[0] for x in result['date_1']]
result['results'] = [x[0] for x in result['results']]

Output :

                        date_1  results                     date__2  results_2
No.                                                                           
s1  2019-07-29 08:00:45.878494     1.20                           0       0.00
s2  2019-07-29 08:00:45.878499     9.73  2019-07-29 08:00:45.878500       3.71

edited Jul 29 '19 at 09:15

answered Jul 29 '19 at 08:13

vlemaistre

3,301
13
30

This doesn't return an error when you run it? – Itamar Mushkin Jul 29 '19 at 08:42
It doesn't on my end, what error is raised when you run the code ? – vlemaistre Jul 29 '19 at 08:47
I replaced `list` with the more explicit `lambda x: list(x)` and it worked. – Itamar Mushkin Jul 29 '19 at 08:50
Hmm you don't need to use a lambda function to do the list aggregation when using `groupby()`, it's a heavier syntax than just using the `list` keyword. What version of pandas are you running ? – vlemaistre Jul 29 '19 at 08:57
I use the pandas version is 0.23.4 – tomy Jul 29 '19 at 09:07
1

I'm running 0.24.2, and a lot of work on the groupby function was done in 0.24. Try to upgrade your version of pandas to at least 0.24 and the code should workd fine. Otherwise I added @ItamarMushkin fix if you can't upgrade your pandas version – vlemaistre Jul 29 '19 at 09:13
1

I'm on 0.21.1. Of course it's better to use the `list` keyword, just heads up for the readers using an older version. – Itamar Mushkin Jul 29 '19 at 09:16
1

Thanks for helping me. – tomy Jul 30 '19 at 01:47

Itamar Mushkin · Answer 2 · 2019-07-29T10:00:53.567

Building upon vlemaistre's answer - you can do it in a more compact way:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
df = pd.DataFrame({'No.' : ['s1', 's2', 's2'], 'date' : [datetime.now()+timedelta(days=x) for x in range(3)],
                  'results' : [1.2, 9.73, 3.71]})

joint_df = df.groupby('No.')[['date', 'results']].agg(lambda x: list(x))
result = pd.DataFrame(index=joint_df.index)
for column in df.columns.difference({'No.'}):
    result = result.join(pd.DataFrame.from_records(
        list(joint_df[column]), index=joint_df.index).rename(lambda x: column+str(x+1), axis=1), how='outer')

Output is:

    date1                       date2                       results1    results2
No.             
s1  2019-07-29 12:58:28.627950  NaT                         1.20        NaN
s2  2019-07-30 12:58:28.627957  2019-07-31 12:58:28.627960  9.73        3.71

Be careful OP doesn't want to have a duplicate s2 row – vlemaistre Jul 29 '19 at 09:45 — vlemaistre, Jul 29 '19 at 09:45
Thanks! I've changed the answer so there's no duplication. – Itamar Mushkin Jul 29 '19 at 10:01 — Itamar Mushkin, Jul 29 '19 at 10:01

Pandas - How to convert row data to columns

2 Answers2