0

I'm cleaning up some data and found that the person who entered the data made some mistakes and entered duplicate rows, except for one column. In that column, I need to add the two numbers together and then remove the duplicate rows. My data set is over 1 million rows, so I've provided a fictitious example. I'm still assessing, but it looks like I have about 300 instances of this.

Example

data <- data.frame(City = c("Portland", "Portland", "Seattle", "Seattle", "Los Angeles", "Las Vegas", "Salt Lake City"),
                   Country = c("USA", "USA", "USA", "USA", "USA", "USA", "USA"),
                   Year = c("2020", "2020", "2020", "2020", "2020", "2020", "2020"),
                   Population = c(25, 5, 30, 8, 10, 15, 15))

Expected

expected <- data.frame(City = c("Portland", "Seattle", "Los Angeles", "Las Vegas", "Salt Lake City"),
                       Country = c("USA", "USA", "USA", "USA", "USA"),
                       Year = c("2020", "2020", "2020", "2020", "2020"),
                       Population = c(30, 38, 10, 15, 15))
pkpto39
  • 545
  • 4
  • 11
  • 1
    `result <- aggregate(Population~., data, sum)` – Ronak Shah Dec 10 '20 at 03:35
  • @RonakShah Ronak could you please explain what is the meaning of `~.` after population. – Daman deep Dec 10 '20 at 03:52
  • 1
    That means all the other columns without `Population`. @Damandeep – Ronak Shah Dec 10 '20 at 03:57
  • @RonakShah When I enter `data %>% select(City~.)` an error occurs whys that? – Daman deep Dec 10 '20 at 04:01
  • @Damandeep, I just tried running your code and I also got an error. Looking at Ronak's code and then looking at the error, my guess is that it is because with select() you are literally telling R which columns you want to keep (and what order). WIth aggregate(Population~.,...) you are telling R which columns you want to perform a function on. – pkpto39 Dec 10 '20 at 04:33
  • @pkpto39 I didn't give you any code I am asking him some questions. – Daman deep Dec 10 '20 at 04:58

0 Answers0