Extract String from duplicate row, remove duplicate, give count of strings

Question

I'm relatively new to Python/panda. Here is my problem: I have a df looking like this:

df = pd.DataFrame({
    'ZIP Code': ['1234','1234', '5678', '9101'],
    'City Name': ['City A', 'City A', 'City B', 'City C'],
    'Newspaper': ['City A News', 'City A Newspaper', 'News for City B', 'C News'],
})

As you can see, for City A, there are two Newspapers, but they are listed in two different rows. In my real df, I have multiple cases of cities, listed in separate rows, completely identical to each other, except for the listed newspaper.

So I want to create a df that looks somehow like this:

df_wanted = pd.DataFrame({
'ZIP Code': ['1234', '5678', '9101'],
'City Name': ['City A', 'City B', 'City C'],
'Newspaper': ['City A News, City A Newspaper', 'News for City B', 'C News'],
'Number of Newspapers': [2, 1, 0]

})

So basically, I want to remove the duplicate rows and add the string of the not- duplicate newspaper to the first entry for the city.

Additionally, I want to create a column that counts the number of newspapers so I can use the numbers for further analysis.

I hope you can help me with my problem! Thanks in advance :)

So far, I didnt find a solution to my problem.

Use `groupby.agg` with `','.join` and `'count'`: `df.groupby(['ZIP Code', 'City Name'], as_index=False).agg(**{'Newspaper': ('Newspaper', ', '.join), 'Number of Newspapers': ('Newspaper', 'size')})` — mozway, Aug 24 '23 at 15:42
Thank you, wasnt looking at the console, your solution works for me! How can I save this to my df? — Bantha Hunter, Aug 24 '23 at 16:10

Quang Hoang · Accepted Answer · 2023-08-24T15:58:14.943

You can use groupby named agg:

df.groupby('ZIP Code').agg(**{
    'City Name': ('City Name', 'first'),
    'Newspaper': ('Newspaper', ', '.join),
    'No. Newspaper': ('Newspaper', 'size')
})

Output:

         City Name                      Newspaper  No. Newspaper
ZIP Code                                                        
1234        City A  City A News, City A Newspaper              2
5678        City B                News for City B              1
9101        City C                         C News              1

Extract String from duplicate row, remove duplicate, give count of strings

1 Answers1