I'm relatively new to Python/panda. Here is my problem: I have a df looking like this:
df = pd.DataFrame({
'ZIP Code': ['1234','1234', '5678', '9101'],
'City Name': ['City A', 'City A', 'City B', 'City C'],
'Newspaper': ['City A News', 'City A Newspaper', 'News for City B', 'C News'],
})
As you can see, for City A, there are two Newspapers, but they are listed in two different rows. In my real df, I have multiple cases of cities, listed in separate rows, completely identical to each other, except for the listed newspaper.
So I want to create a df that looks somehow like this:
df_wanted = pd.DataFrame({
'ZIP Code': ['1234', '5678', '9101'],
'City Name': ['City A', 'City B', 'City C'],
'Newspaper': ['City A News, City A Newspaper', 'News for City B', 'C News'],
'Number of Newspapers': [2, 1, 0]
})
So basically, I want to remove the duplicate rows and add the string of the not- duplicate newspaper to the first entry for the city.
Additionally, I want to create a column that counts the number of newspapers so I can use the numbers for further analysis.
I hope you can help me with my problem! Thanks in advance :)
So far, I didnt find a solution to my problem.