I have a dataframe with a column (name
) that contains recurring strings and I want to concatenate values from another column (reports
) with identicical values in 'name' into one cell.
I thought I'll be getting there with a while-if-elif loop but especially the concatenation only works in the first loop. I just can't figure out why!?
#example dataframe
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Amy', 'Jake'],
'year': [2012, 2012, 2013, 2014, 2014, 2014, 2014],
'reports': ['Hallo', 'how are', 'you', 'not so', 'foo', 'damn it', 'bar']}
df = pd.DataFrame(data)
df['index'] = df.index
#sorting to make the loop work (kept the original index just in case...)
dfSort = df.sort_values(['name'], ascending = True)
dfSort.astype('str')
dfSort.reset_index(inplace = True, drop = True)
#setting thins up
dfSorted = dfSort.copy()
num = df['name'].count()
i = 0
j = 1
dfNew = dfSorted.loc[:0, :]
#loop
while j < num:
x = dfSorted.iloc[i:i+1, :1].values
y = dfSorted.iloc[j:j+1, :1].values
firstValue = dfSorted.iloc[i:i+1, 2:3].values
nextValue = dfSorted.iloc[j:j+1, 2:3].values
print(x, y, firstValue, nextValue, i, j) #just to see what's going on
if x != y:
NewRow = dfSorted.iloc[j:j+1, :]
dfNew = pd.concat([dfNew, NewRow])
i = j
j += 1
elif x == y:
dfNew.iloc[i:i+1, :] = dfSorted.iloc[i:i+1, :]
dfNew.iloc[i:i+1, 2:3] = firstValue + ', ' + nextValue
j += 1
print('done')#just to see it actually finished the elif-statement
dfNew.head()
dfNew
should have Amy
with "foo, damn it"
and Jake
with "not so, bar"
and delete the duplicate rows and keep everything else.
I am sure this is an easy error, but I just don't see it.