1

I need to change the following data frame in which one column contains a list of tuple

df = pd.DataFrame({'columns1':list('AB'),'columns2':[1,2], 
                   'columns3':[[(122,0.5), (104, 0)], [(104, 0.6)]]})

print (df)
  columns1  columns2                columns3
0        A         1  [(122, 0.5), (104, 0)]
1        B         2            [(104, 0.6)]

in to this, in which the tuple first element should be the column header

  columns1  columns2  104  122
0        A         1  0.0  0.5
1        B         2  0.6  NaN

How can I do this using panda in Jupiter notebook

RZLJ
  • 73
  • 6
  • Please read "[How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15239951)". – Corralien May 05 '21 at 05:53

1 Answers1

1

Use list comprehension with convert values to dictionaries, sorting columns and add to original with DataFrame.join:

df = pd.read_csv('Sample - Sample.csv.csv')
print (df)
  column1 column2                                            column3
0       A      U1                       [(187, 0.674), (111, 0.738)]
1       B      U2                                        [(54, 1.0)]
2       C      U3  [(169, 0.474), (107, 0.424), (88, 0.519), (57,...
                                                              
import ast

df1 = pd.DataFrame([dict(ast.literal_eval(x)) for x in df.pop('column3')], index=df.index).sort_index(axis=1)
df = df.join(df1)
print (df)
  column1 column2   54     57     64     88    107    111    169    187
0       A      U1  NaN    NaN    NaN    NaN    NaN  0.738    NaN  0.674
1       B      U2  1.0    NaN    NaN    NaN    NaN    NaN    NaN    NaN
2       C      U3  NaN  0.526  0.217  0.519  0.424    NaN  0.474    NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I tried this, and got the below error ValueError: dictionary update sequence element #0 has length 1; 2 is required – RZLJ May 05 '21 at 06:19
  • @RZLJ I tried this code with your sample data without problem. – SeaBean May 05 '21 at 06:24
  • 1
    Upvoted. This solution is superior to the other solution in that 1) it uses `sort_index()` to sort the column names. 2) it uses `index=` parameter in `pd.DataFrame()` to retain index for in case original index is not simple range index. 3) Using `.join()` together with `pop` gives more succinct codes than `pd.concat()` – SeaBean May 05 '21 at 06:29
  • @RZLJ - Are data confidental? Because here is data related problem. – jezrael May 05 '21 at 09:11
  • Is there a way I can attach my sample.csv which give me the ValueError – RZLJ May 05 '21 at 09:12
  • @RZLJ - Is possible share by wetransfer, dropbox, gdocs? – jezrael May 05 '21 at 09:16
  • link for the .csv file https://docs.google.com/spreadsheets/d/1pHsKMC3oYsM-2iaxQCn5EqENvYq8o1SCBLwXbYKyOhc/edit?usp=sharing – RZLJ May 05 '21 at 09:25
  • 1
    @jezrael Perfect.. Working fine. What is the difference in the data. will you be able to explain please? – RZLJ May 05 '21 at 09:59
  • @RZLJ - Problem was data was not list of tuples, but strings. So first step is used `ast.literal_eval` for converting strings to python data structures. – jezrael May 05 '21 at 10:00