1

I have a dataframe with a column (name) that contains recurring strings and I want to concatenate values from another column (reports) with identicical values in 'name' into one cell.

I thought I'll be getting there with a while-if-elif loop but especially the concatenation only works in the first loop. I just can't figure out why!?

#example dataframe
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy', 'Amy', 'Jake'], 
        'year': [2012, 2012, 2013, 2014, 2014, 2014, 2014], 
        'reports': ['Hallo', 'how are', 'you', 'not so', 'foo', 'damn it', 'bar']}
df = pd.DataFrame(data)
df['index'] = df.index

#sorting to make the loop work (kept the original index just in case...)
dfSort = df.sort_values(['name'], ascending = True)
dfSort.astype('str')
dfSort.reset_index(inplace = True, drop = True)

#setting thins up
dfSorted = dfSort.copy()
num = df['name'].count()
i = 0
j = 1
dfNew = dfSorted.loc[:0, :]

#loop
while j < num:
    x = dfSorted.iloc[i:i+1, :1].values
    y = dfSorted.iloc[j:j+1, :1].values
    firstValue = dfSorted.iloc[i:i+1, 2:3].values
    nextValue = dfSorted.iloc[j:j+1, 2:3].values
    print(x, y, firstValue, nextValue, i, j) #just to see what's going on

    if x != y:
        NewRow = dfSorted.iloc[j:j+1, :]
        dfNew = pd.concat([dfNew, NewRow])
        i = j
        j += 1

    elif x == y:
        dfNew.iloc[i:i+1, :] = dfSorted.iloc[i:i+1, :]
        dfNew.iloc[i:i+1, 2:3] = firstValue + ', ' + nextValue

        j += 1
        print('done')#just to see it actually finished the elif-statement
dfNew.head()

dfNew should have Amy with "foo, damn it" and Jake with "not so, bar" and delete the duplicate rows and keep everything else. I am sure this is an easy error, but I just don't see it.

user2653663
  • 2,818
  • 1
  • 18
  • 22
  • Provide your expected output as well. This most likely doesn''t need a while if else loop. Pandas provides many functions. – Erfan Aug 08 '19 at 12:09
  • You want `df.groupby(['name', 'year'], as_index=False).agg(', '.join)` probably. – Erfan Aug 08 '19 at 12:16

1 Answers1

0

Is this what you're looking for?

dfNew = df.groupby(['name', 'year'], as_index=False)['reports'].agg(list)
dfNew['reports'] = dfNew['reports'].str.join(', ')
    name    year    reports
0   Amy     2014    foo, damn it
1   Jake    2014    not so, bar
2   Jason   2012    Hallo
3   Molly   2012    how are
4   Tina    2013    you
help-ukraine-now
  • 3,850
  • 4
  • 19
  • 36