I have a goofy data where one column contains multiple values slammed together with a comma:
In [62]: df = pd.DataFrame({'U': ['foo', 'bar', 'baz'], 'V': ['a,b,a,c,d', 'a,b,c', 'd,e']})
In [63]: df
Out[63]:
U V
0 foo a,b,a,c,d
1 bar a,b,c
2 baz d,e
Now I want to split column V
, drop it, and add columns a
through e
. Columns a
through e
should contains the count of the occurrences of that letter in that row:
In [62]: df = pd.DataFrame({'U': ['foo', 'bar', 'baz'], 'V': ['a,b,a,c,d', 'a,b,c', 'd,e']})
In [63]: df
Out[63]:
U a b c d e
0 foo 2 1 1 1 0
1 bar 1 1 1 0 0
2 baz 0 0 0 1 1
Maybe some combination of df['V'].str.split(',')
and pandas.get_dummies
but I can't quite work it out.
Edit: apparently I have to justify why my question is not a duplicate. I think why is intuitively obvious to the most casual observer.