Is there an nice way to do the below?
This is exactly the same question as here: Split pandas dataframe string entry to separate rows
But that post is pretty old and wondering if there is a better method using newer pandas features.
I have managed to reproduce with my data as below but not sure how to incorporate more than 2 columns. In other words my var3 would be treated similarly as var 2 where it is replicated across the rows.
Sort of get the logic of row[val]
row['var2'], row['var3'], row['var1'].split(',')
produces:
(99999, 1403298300, [u'08241', u'08215', u'08217'])
But still not sure how to extend this out to more than 2 columns.
Out[104]:
var1 var2 var3
0 47429,47404 10700 1403298300
1 23030,23831,23147,23836,23860,23875 99999 1403297100
2 72930,72951,72832,72820,72949,72821 10200 1403298300
3 56522,58030,56583,56565 99999 1403295900
4 59824,59831,59821,59863,59865 99999 1403294700
pd.concat([pd.Series(row['var2'], row['var1'].split(','))\
for _, row in testdf.iterrows()]).reset_index()[:5]
index 0
0 47429 10700
1 47404 10700
2 23030 99999
3 23831 99999
4 23147 99999
Example provided by older post:
In [7]: a
Out[7]:
var1 var2
0 a,b,c 1
1 d,e,f 2
In [8]: b
Out[8]:
var1 var2
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2