1
people1 trait1 YES
people1 trait2 YES
people1 trait3 NO
people1 trait4 RED
people2 trait1 NO
people2 trait2 YES
people2 trait4 BLACK

etc..

It's possible to create from that table something like this?

        trait1, trait2, trait3, trait4 ...
people1  YES     YES     NO      RED
people2  NO      YES     -       BLACK
people3  -        -      YES     BLUE

The file is too big to do that in excel, I tried in pandas, but I can't find help in this case. I found pd.pivot_table funcion but I can't build working code. I tried and got various erors (99% my fault).

Can someone explain me how to use it in my case? Or maybe is better option than pandas.pivot?+

EDIT

I rebuild my frame:
1      'interpretation'     'trait'
p1           YES               t1
p1           BLACK             t2
p1           NO                t3
p2           NO                t1
p2           RED               t2
p2           NO                t3

And I use suggestion:

data1.pivot_table(index=1, columns="name", values='trait', aggfunc=','.join, fill_value='-').

And I got:

TypeError: sequence item 0: expected str instance, float found

If I change

data1.pivot_table(index=1, columns="trait", values='value', aggfunc=','.join, fill_value='-').

I got bad order table but without error:

     p1      p2    p3    p4
YES  trait1  t1
YES  t1      t2 etc.
NO
RED
No
...

So i think, the first option is correct, but I cant repair that error. When I dtype df it return (O) for all cols.

martin
  • 1,145
  • 1
  • 7
  • 24
  • Can you add your code, errors? – jezrael Mar 06 '20 at 11:55
  • usually : ValueError: Index contains duplicate entries, cannot reshape even I drop duplicates from df. I think I just do something wrong logical. And I saw that post that u marked, but I can resolve my problem based on it. – martin Mar 06 '20 at 12:00
  • Now my code looks that: `data1.pivot(index=1, columns=2, values=3).drop_duplicates()` I mean that pivot part, but I still keep trying, and that is simplest verion. I try do anything with it. – martin Mar 06 '20 at 12:01
  • I suggest use `df = df.pivot_table(index='col1', columns='col2', values='col3', aggfunc=','.join, fill_value='-')` – jezrael Mar 06 '20 at 12:01
  • Thank U for answer. It's creating table. It's totally different that what I need but as I understand that I'm mismatching the columns to the arguments. I'll try. – martin Mar 06 '20 at 12:08
  • @jezrael Can u look on my EDIT? Coz I don't want to write a new post but that is locked. – martin Mar 06 '20 at 12:43

1 Answers1

1

I think problem is missing values in column trait, so join function failed. So possible solution is replace missing values to empty strings:

print (data1)
    1   name trait
0  p1    YES   NaN <- missing value
1  p1  BLACK    t2
2  p1     NO    t3
3  p2     NO    t1
4  p2    RED    t2
5  p2     NO    t3

data1['trait'] = data1['trait'].fillna('')
df = data1.pivot_table(index=1, 
                       columns="name", 
                       values='trait', 
                       aggfunc=','.join, 
                       fill_value='-')
print (df)
1      p1     p2
name            
BLACK  t2      -
NO     t3  t1,t3
RED     -     t2
YES            -

Also if want convert index to column:

data1['trait'] = data1['trait'].fillna('')
df = (data1.pivot_table(index=1, 
                       columns="name", 
                       values='trait', 
                       aggfunc=','.join, 
                       fill_value='-')
           .reset_index()
           .rename_axis(None, axis=1))
print (df)
    name  p1     p2
0  BLACK  t2      -
1     NO  t3  t1,t3
2    RED   -     t2
3    YES          -
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252