Grouping a dataframe by date and also creating new column based on column values

Question

I'm working with a dataframe using Pandas. The dataframe has this structure:

  VIN       readDate                   Type   description
------    ---------------------       -----    ----------      
C92320      2013-02-05 15:17:25.000     SV     text1          
C92320      2013-02-05 15:17:25.000     FA     text2  
C92320      2013-02-10 18:30:25.000     FA     text3       
F45910      2013-01-22 06:17:55.000     SV     text4          
F45910      2013-01-23 07:20:55.000     SV     text5          
F45910      2013-01-23 07:20:55.000     DK     text6          
J20920      2015-10-24 12:48:38.000     SV     text7         
P20824      2014-01-24 11:58:38.000     FA     text8

For every VIN I'd like to group by the date, and then create new columns from the values in Type with the description included (removing the two old columns Type and Description). It should look like this:

  VIN       readDate                  SV     FA     DK     
------    ---------------------      ----   ----   -----    
C92320      2013-02-05 15:17:25.000  text1   text2    
C92320      2013-02-10 18:30:25.000          text3   
F45910      2013-01-22 06:17:55.000  text4    
F45910      2013-01-23 07:20:55.000  text5          text6
J20920      2015-10-24 12:48:38.000  text7             
P20824      2014-01-24 11:58:38.000          text8

Could anyone assist me with help for this issue? Have not been able to find a solution to this, appreciate help.

Does this answer your question? [Pandas long to wide reshape, by two variables](https://stackoverflow.com/questions/22798934/pandas-long-to-wide-reshape-by-two-variables) — Juan C, Jan 20 '22 at 19:36

norie · Answer 1 · 2022-01-20T20:19:12.420

If there are no duplicate rows you could do this using pivot.

import pandas as pd

df = pd.read_csv('test.csv')

df = (
    df.pivot(index=["VIN", "readDate"], columns="Type", values="description")
    .reset_index()
    .fillna("")
)

print(df)

Type     VIN                 readDate     DK     FA     SV
0     C92320  2013-02-05 15:17:25.000         text2  text1
1     C92320  2013-02-10 18:30:25.000         text3
2     F45910  2013-01-22 06:17:55.000                text4
3     F45910  2013-01-23 07:20:55.000  text6         text5
4     J20920  2015-10-24 12:48:38.000                text7
5     P20824  2014-01-24 11:58:38.000         text8

If there are duplicate rows you could either use drop_duplicates to remove them first,

df = (
    df.drop_duplicates()
    .pivot(index=["VIN", "readDate"], columns="Type", values="description")
    .reset_index()
    .fillna("")
)

Type         VIN      readDate     DK     FA     SV
0     2013-01-22  06:17:55.000                text4
1     2013-01-23  07:20:55.000  text6         text5
2     2013-02-05  15:17:25.000         text2  text1
3     2013-02-10  18:30:25.000         text3
4     2014-01-24  11:58:38.000         text8
5     2015-10-24  12:48:38.000                text7

or use pivot_table.

df = (
    pd.pivot_table(
        df,
        values="description",
        index=["VIN", "readDate"],
        columns="Type",
        aggfunc=list,
    )
    .reset_index()
    .fillna("")
)

Type     VIN                 readDate       DK       FA              SV
0     C92320  2013-02-05 15:17:25.000           [text2]  [text1, text1]
1     C92320  2013-02-10 18:30:25.000           [text3]
2     F45910  2013-01-22 06:17:55.000                           [text4]
3     F45910  2013-01-23 07:20:55.000  [text6]                  [text5]
4     J20920  2015-10-24 12:48:38.000                           [text7]
5     P20824  2014-01-24 11:58:38.000           [text8]

I get this error: ValueError: Index contains duplicate entries, cannot reshape — fejz1234, Jan 20 '22 at 19:44
In the actual data do you have any rows that are exactly the same? If you do, what do you want to do with them? — norie, Jan 20 '22 at 19:59
After viewing the data, yes it seems to occur some rows that are identical. I'd like to filter away those rows into only one unique row then. — fejz1234, Jan 20 '22 at 20:09
I forgot to add that I do not want the new column names from Type to be unique, a new row can have several 'SV' with different values. — fejz1234, Jan 20 '22 at 20:16
Instead of SV / [textA,textB] I'd like to have two SV columns with the values. Like this: SV/ textA --- SV/textB — fejz1234, Jan 20 '22 at 20:17

Grouping a dataframe by date and also creating new column based on column values

1 Answers1