Python - Merge Column B where Column A is duplicate

Question

I have some data in a dataframe (df):

    Reference   Description
0   C11621/2    The findings have been used
1   B01026/1    Findings from this research
2   D01469/1    The PopChange web resource 
3   AM0156/1    The whole project was designed 
4   AM0156/1    The data set has been used 
5   AM0156/1    This project has become one

There might be duplicates in the 'Reference' column, and if there is, I want to merge the data together to make only one row i.e. in the dataframe above, the below 3 rows have duplicated Reference numbers:

    Reference   Description
3   AM0156/1    The whole project was designed  ...
4   AM0156/1    The data set has been used ...
5   AM0156/1    This project has become one ...

I want to turn that into:

    Reference   Description
3   AM0156/1    The whole project was designed The data set has been used This project has become one

How would one go about that?

So use `df.groupby('Reference')['Description'].apply(' '.join).reset_index()` — jezrael, Jun 26 '18 at 08:38
Thank you Jezrael. I did search for an answer before asking a question, but I didnt find that post. Thank you very much :) — Nicholas, Jun 26 '18 at 08:39

Python - Merge Column B where Column A is duplicate

0 Answers0