1

hi I have a datafrme and a column contained list.

d = {'col1': {0: 'A', 1: 'A', 2: 'B'},
 'col2': {0: ['a', 'b', 'c', 'a'], 1: ['b', 'c'], 2: ['a', 'd', 'e']}}
pd.DataFrame(d)

  col1          col2
0    A  [a, b, c, a]
1    A        [b, c]
2    B     [a, d, e]

how I can count each element of the list and make rows columns? Note some rows have the same name as A

output:

  col2  A  A1  B
0    a  2   0  1
1    b  1   1  0
2    c  1   1  0
3    d  0   0  1
4    e  0   0  1
anky
  • 74,114
  • 11
  • 41
  • 70

1 Answers1

1

Assuming there are lists in col2 you can do groupby+cumcount for assigning 1 for the repeating A and then explode with crosstab

u = df.assign(col1=df['col1']+df.groupby("col1").cumcount()
              .replace(0,'').astype(str)).explode('col2')
out = pd.crosstab(u['col2'],u['col1']).rename_axis(None,axis=1) #.reset_index()

print(out)

      A  A1  B
col2          
a     2   0  1
b     1   1  0
c     1   1  0
d     0   0  1
e     0   0  1
anky
  • 74,114
  • 11
  • 41
  • 70
  • TypeError: unsupported operand type(s) for +: 'Period' and 'str' –  Apr 15 '21 at 20:21
  • @user15649753 Cant replicate your issue, it will be good if you can provide the code to reproduce the dataframe. Something along the lines of how I have edited your question with the code to reproduce the current dataframe. – anky Apr 16 '21 at 03:27
  • my `col2` is huge I can't print it (like 5000 elements in each list). But I should tell you my `col1` is a timestamp –  Apr 16 '21 at 15:41
  • @user15649753 I am not saying you post the entire dataframe, I am saying you can make a dummy df (5-6 lines) which resembles your actual df and also post the expected output according to that dummy df, See this: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – anky Apr 16 '21 at 15:43
  • Even one row contains a list of 5000 items –  Apr 16 '21 at 16:11
  • @user15649753 you can create a dummy example having 5-6 items in each line though, it is impossible to guess how your data is otherwise – anky Apr 16 '21 at 16:12