Attempting to read a data frame that has values in a random rows/columns order and I would like to get a new column where the all the values containing 'that' are summarized.
Input:
0 1 2 3 4
0 this=1 that=2 who=2 was=3 where=5
1 that=4 who=5 this=1 was=3 where=2
2 was=2 who=7 this=7 that=3 where=7
3 was=3 who=4 this=7 that=1 where=8
4 that=1 who=3 this=4 was=1 where=3
Output:
0
0 that=2
1 that=4
2 that=3
3 that=1
4 that=1
I have been successfully able to get the correct result with the following code. But with larger data frames it takes a long time to complete
df1=pd.DataFrame([['this=1', 'that=2', 'who=2', 'was=3', 'where=5'],
['that=4', 'who=5', 'this=1', 'was=3', 'where=2'],
['was=2', 'who=7', 'this=7', 'that=3','where=7'],
['was=3', 'who=4', 'this=7', 'that=1', 'where=8'],
['that=1', 'who=3', 'this=4', 'was=1', 'where=3']],
columns=[0,1,2,3,4])
df2=pd.DataFrame()
for i in df1.index:
data=[name for name in df1[i] if name[0:4]=='that']
df2=df2.append(pd.DataFrame(data))