2

i am not sure if, and how i can do the following transformation:

I have a DataFrame looking like this:

Index   Name    detail1 detail1_value   detail2  detail2_value   detail3    detail3_value
1     Albert    Age      30             Group       A            Hometown   beautifulplace
2     Bea       Age      28             Hometown    anotherplace None       None
3     Celin     Age      45             Group       B            None       None
4     Dave      Group    A              None        None         None       None

But as you can imagine, my aim is:

Index   Name    Age Group   Hometown
1     Albert    30  A   beautifulplace
2     Bea       28      anotherplace
3     Celin     45  B   
4     Dave          A   

I am pretty sure that ech detail just appears once. To keep things complex: i am not sure if every detail is total identical (in some cases for example Hometowns instead of Hometown).

The only solution i can see so far is to produce singel pivot-tables out of each pair of columns (like detail1 and detail1_value). In a second step an new dataset is created and each of these pivot-tables is searched for example on the Information on the Age. But my trust in python tells me, that there must be a better way...

Thanks!

PS: May help:

dataset = pd.DataFrame({'Name': ['Albert', 'Bea', 'Celine', 'Dave'],
                        'detail1': ['Age', 'Age', 'Age', 'Group'],
                        'detail1_value': ['30', '28', '45', 'A'],
                        'detail2': ['Group', 'Hometown', 'Group', None],
                        'detail2_value': ['A', 'anotherplace', 'B', None],
                        'detail3': ['Hometown', None, None, None],
                        'detail3_value': ['beautifulplace', None, None, None]})
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Big.Joe
  • 85
  • 8
  • I'm a bit puzzled by your dataset. As an example: For Index 2 it seems to me that the value Hometown belongs in column detail3, and that anotherplace belongs in column detail3_value. – vestland Aug 09 '17 at 14:07
  • You are right. That's the reason for my question. You will find the same situaiton with Index 4, where "Group" is in detail1 instead of detail2 like the other "groups" are. Otherwise the solution would be a simple pivot. – Big.Joe Aug 09 '17 at 14:52
  • In that case I think you should reconsider how your initial data frame is built. Shifting the values in question two steps to the right and replacing them with None would make for an easier pivot operation. Is it by any chance imported from Excel? – vestland Aug 09 '17 at 15:26

2 Answers2

1

You can melt the dataframe two times - once for variables and once for their values. Then merge them back using Name and which detail this variable came from. The merged dataframe should be ready for pivoting, see example below:

id_cols  = ['Name']
var_cols = ['detail1','detail2','detail3']
val_cols = ['detail1_value','detail2_value','detail3_value']
val_var_mapping = {k:v for k,v in zip(val_cols, var_cols)}

# extract variables
variables = dataset[id_cols+var_cols].melt(id_vars=['Name'],
                                           value_name='variable',var_name='detail')
# print(variables.head())
#      Name   detail variable
# 0  Albert  detail1      Age
# 1     Bea  detail1      Age
# 2  Celine  detail1      Age
# 3    Dave  detail1    Group
# 4  Albert  detail2    Group

# extract values
values = dataset[id_cols+val_cols].melt(id_vars=['Name'], var_name='detail')
values['detail'] = values['detail'].replace(val_var_mapping)
# print(values.head())
#      Name   detail value
# 0  Albert  detail1    30
# 1     Bea  detail1    28
# 2  Celine  detail1    45
# 3    Dave  detail1     A
# 4  Albert  detail2     A

# merge and pivot
res = (variables.dropna()
                .merge(values, on=id_cols+['detail'])
                .pivot(index='Name',columns='variable',values='value')
      )
# print(res)
# variable   Age Group        Hometown
# Name                                
# Albert      30     A  beautifulplace
# Bea         28  None    anotherplace
# Celine      45     B            None
# Dave      None     A            None

As for the Hometown versus Hometowns you can inspect the unique values of the variable column and maybe replace some of them with a standardized version.

gereleth
  • 2,452
  • 12
  • 21
1

You can use lreshape with pivot:

#get columns names dynamically 
a = dataset.columns[dataset.columns.str.endswith('_value')]
b = dataset.columns[dataset.columns.str.startswith('detail')].difference(a)

df = pd.lreshape(dataset, {'detail':b, 'value':a})
print (df)
     Name           value    detail
0  Albert              30       Age
1     Bea              28       Age
2  Celine              45       Age
3    Dave               A     Group
4  Albert               A     Group
5     Bea    anotherplace  Hometown
6  Celine               B     Group
7  Albert  beautifulplace  Hometown


df = df.pivot(index='Name', columns='detail', values='value')
print (df)
detail   Age Group        Hometown
Name                              
Albert    30     A  beautifulplace
Bea       28  None    anotherplace
Celine    45     B            None
Dave    None     A            None

Some data cleaning last:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
     Name   Age Group        Hometown
0  Albert    30     A  beautifulplace
1     Bea    28  None    anotherplace
2  Celine    45     B            None
3    Dave  None     A            None
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank You, this solution worked well for me. I Would be amazed to learn more on lreshape (nerver heard before). Whats the difference to reshape and why cant i find Information on lreshape? Anyway, your solution was helpfull. – Big.Joe Aug 29 '17 at 08:45
  • Glad can help, nice day! – jezrael Aug 29 '17 at 08:45