1

I am trying to group a dataset on a certain value and then sum a column based on this grouped value.

UN.surface.area.share <- left_join(countries, UN.surface.area, by = 'country') %>% drop_na() %>%
rename('surface.area' = 'Surface.area..km2.') %>%  group_by(region) %>% summarise(total.area = sum(surface.area))

When I run this I get this error:

Error: Problem with `summarise()` input `total.area`.
x invalid 'type' (character) of argument
i Input `total.area` is `sum(surface.area)`.
i The error occurred in group 1: region = "Africa".

I think the problem is that the 'surface.area' column is of the character type and therefore the sum function doesn't work. I tried adding %>% as.numeric('surface.area') to the previous code:

UN.surface.area.share <- left_join(countries, UN.surface.area, by = 'country') %>% drop_na() %>%
rename('surface.area' = 'Surface.area..km2.') %>% as.numeric('surface.area') %>%  group_by(region) %>% summarise(total.area = sum(surface.area))

But this gives the following error:

Error in group_by(., region) : 
  'list' object cannot be coerced to type 'double'

I think this problem can be solved by changing the 'surface.area' column to a numeric datatype but I am not sure how to do this. I checked the column and it only consists of numbers.

Guello
  • 61
  • 1
  • 8
  • Can you give a few examples of the numbers in `'surface.area'`? – Chris Ruehlemann Apr 14 '21 at 14:00
  • Instead of `rename('surface.area' = 'Surface.area..km2.') %>% as.numeric('surface.area')`, try: `rename('surface.area' = as.numeric('Surface.area..km2.')` – Chris Ruehlemann Apr 14 '21 at 14:03
  • All entries consist of 1 to 6 integers – Guello Apr 14 '21 at 14:09
  • When I try rename('surface.area' = as.numeric('Surface.area..km2.') it gives the error Error: Selections can't have missing values. I double-checked and there are really no values other than numbers in the column. – Guello Apr 14 '21 at 14:11
  • 2
    Can you give us a reproducible example of the dataset – neuroandstats Apr 14 '21 at 14:22
  • 1
    Please consider this and create a good example of your data so other can support you in finding answer easily. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Sinh Nguyen Apr 14 '21 at 14:24
  • What happens when you do `as.numeric(UN.surface.area$surface.area)` (assuming this is the starting vector)? If they really are just integers, I can't see why as.numeric could give an error. Perhaps you would find this answers useful (in case there are spaces in your characters): https://stackoverflow.com/questions/49294602/cant-convert-character-to-numeric-in-r – Dylan_Gomes Apr 14 '21 at 15:59

1 Answers1

1

Use dplyr::mutate()

So instead of:

... %>% as.numeric('surface.area') %>%...

do:

...%>% mutate(surface.area = as.numeric(surface.area)) %>%...

mutate() changes one or more variables within a dataframe. When you pipe to is.numeric, as you're currently doing, you're effectively asking R to run

as.numeric(data.frame.you.piped.in, 'surface.area')

as.numeric then tries to convert the data frame into a number, which it can't do since the data frame is a list object. Hence your error. It's also running with two arguments, which will cause a crash regardless of the structure of the first argument.

Captain Hat
  • 2,444
  • 1
  • 14
  • 31