0

I have a DataFrame of books that I removed and reworked some information. However, there are some rows in the column "bookISBN" that have duplicate values, and I want to merge all those rows into one.

I plan to make a new DataFrame where I keep the first values for the url, the ISBN, the title and the genre, but I want to sum the values of the column "genreVotes" in order to create the merge. How can I do this?

Original dataframe:

In [23]: network = data[["bookTitle", "bookISBN", "highestVotedGenre", "genreVotes"]]
         network.head().to_dict("list")
Out [23]: 
{'bookTitle': ['The Hunger Games',
  'Twilight',
  'The Book Thief',
  'Animal Farm',
  'The Chronicles of Narnia'],
 'bookISBN': ['9780439023481',
  '9780316015844',
  '9780375831003',
  '9780452284241',
  '9780066238500'],
 'highestVotedGenre': ['Young Adult',
  'Young Adult',
  'Historical-Historical Fiction',
  'Classics',
  'Fantasy'],
 'genreVotes': [103407, 80856, 59070, 73590, 26376]}

Duplicates:

In [24]: duplicates = network[network.duplicated(subset=["bookISBN"], keep=False)]
         duplicates.loc[(duplicates["bookISBN"] == "9780439023481") | (duplicates["bookISBN"] == "9780375831003")]
Out [24]:
{'bookTitle': ['The Hunger Games',
  'The Book Thief',
  'The Hunger Games',
  'The Book Thief',
  'The Book Thief'],
 'bookISBN': ['9780439023481',
  '9780375831003',
  '9780439023481',
  '9780375831003',
  '9780375831003'],
 'highestVotedGenre': ['Young Adult',
  'Historical-Historical Fiction',
  'Young Adult',
  'Historical-Historical Fiction',
  'Historical-Historical Fiction'],
 'genreVotes': [103407, 59070, 103407, 59070, 59070]}

(In this example the votes were all the same but in some cases the values are different).

Expected output:

{'bookTitle': ['The Hunger Games',
  'Twilight',
  'The Book Thief',
  'Animal Farm',
  'The Chronicles of Narnia'],
 'bookISBN': ['9780439023481',
  '9780316015844',
  '9780375831003',
  '9780452284241',
  '9780066238500'],
 'highestVotedGenre': ['Young Adult',
  'Young Adult',
  'Historical-Historical Fiction',
  'Classics',
  'Fantasy'],
 'genreVotes': [260814, 80856, 177210, 73590, 26376]}
  • 1
    Please include any relevant information [as text directly into your question](https://stackoverflow.com/editing-help), do not link or embed external images of source code or data. Images make it difficult to efficiently assist you as they cannot be copied and offer poor usability as they cannot be searched. See: [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/q/285551/15497888) – Henry Ecker May 21 '21 at 20:19
  • 1
    Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 21 '21 at 20:19
  • @HenryEcker is it ok if I add a link to a git repo? The dataset is a bit big, and I feel like if I add all the adjustments I made it's gonna be a long post... – Bruno Signorelli Domingues May 21 '21 at 20:22
  • 1
    So the point of a MRE is that it's minimal. That link to "How to make a good pandas example" says 6 lines or less is optimal. Try to find the smallest set of rows and cols possible to demonstrate the desired behaviour of your solution. – Henry Ecker May 21 '21 at 20:25
  • 1
    You could always do `data.head()` to get the first few rows as text – AmphotericLewisAcid May 21 '21 at 20:25
  • 1
    Bonus points for `df.head().to_dict()` that can be copied straight into a `pd.Dataframe()` constructor. – Henry Ecker May 21 '21 at 20:26
  • @HenryEcker does it look better now? – Bruno Signorelli Domingues May 21 '21 at 20:48
  • 1
    It's almost there. The problem currently is that there's only one instance of "The Hunger Games" and "The Book Thief" in your sample so there's nothing to sum. – Henry Ecker May 21 '21 at 20:51
  • 1
    @HenryEcker ok, I managed to get all instances of Hunger Games and of Book Thief so you can sum. Thanks for helping me make my post more comprehensible! – Bruno Signorelli Domingues May 21 '21 at 20:59

0 Answers0