0

I am trying to get sum of X3.23.20 column group by country

I tried this code using aggregate function

covid <- read.csv("time_series_covid_19_confirmed.csv") %>%
  select(Province.State, Country.Region, X3.23.20) %>%
  aggregate(
    covid$X3.23.20,
    by = list(Country.Region = covid$Country.Region),
    FUN = sum
  )

View(covid)

Always returning error as : Error in Summary.factor(1L, c(599L, 1086L, 455L, 2L, 1306L, 424L, 533L, : ‘sum’ not meaningful for factors

Excel available in https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset time_series_covid_19_confirmed.csv

Ramiro Magno
  • 3,085
  • 15
  • 30
  • 1
    Given your error hints something about a column being of type `factor`, have you tried reading with `read.csv` using the option `stringsAsFactors = FALSE`? – Ramiro Magno Apr 12 '20 at 01:38
  • 1) `covid <- read.csv("time_series_covid_19_confirmed.csv", stringsAsFactors = FALSE)` 2) `covid$X3.23.20 <- as.numeric(covid$X3.23.20)` 3) `aggregate(X3.23.20~Country.Region, covid, sum)` – Ronak Shah Apr 12 '20 at 01:49
  • Error in get(as.character(FUN), mode = "function", envir = envir) : object 'covid' of mode 'function' was not found – Geetu Mol Babu Apr 12 '20 at 03:27

2 Answers2

0

I think that when you are selecting your columns of interest R is considering the column X3.23.20 to be a factor not an integer. Either that or it was designated as a factor when you loaded the csv.

Either way this runs without an issue on my machine:

covid <- read.csv("~/Desktop/time_series_covid_19_confirmed.csv", stringsAsFactors = FALSE)
aggregate(covid$X3.23.20, by = list(Country.Region = covid$Country.Region),FUN = sum)

You can always check the class of the column with:

class(covid$X3.23.20)

Which should be integer in this case. If it is anything else you can convert it with:

covid$X3.23.20<-as.integer(covid$X3.23.20)
JForsythe
  • 758
  • 1
  • 7
  • 12
  • class(covid$X3.23.20) is integer but now showing error as Error in FUN(X[[i]], ...) : invalid 'type' (character) of argument now returning – Geetu Mol Babu Apr 12 '20 at 03:20
0

The combination of tidyverse and non-tidyverse functions causes the code to fail to execute, in addition to the problem with stringsAsFactors in read.csv(). Here is a completely tidyverse version of the code in the original post.

data <- read.csv('time_series_covid_19_confirmed.csv',
                 stringsAsFactors=FALSE)

library(dplyr)
covid <- data %>% select(Country.Region, X3.23.20) %>%
     group_by(Country.Region) %>% 
     summarise(sum_X3.23.20 = sum(X3.23.20))

View(covid)

...and the first few rows of the table viewer:

enter image description here

To aggregate the most recent data by country and province, we can use the following code.

# aggregate most recent data by Country then Province 
library(dplyr)
covid <- data %>% select(Country.Region, Province.State, X4.10.20 ) %>%
     group_by(Country.Region,Province.State) %>% 
     summarise(count = sum(X4.10.20))
View(covid)

...and the first few rows of output:

enter image description here

However, since the underlying data is already organized by Country and Province, no aggregation is actually necessary. We can produce exactly the same result just by extracting the right columns from the original data.

covid <- data[,c(2,1,84)]
View(covid)

...and the first few rows of the resulting data frame.

enter image description here

Len Greski
  • 10,505
  • 2
  • 22
  • 33
  • Thank you but can u help me to follow these steps• Import data using read.csv() • Select Province.state, country region and last column using select() • Use aggregate () function to cbind( last column) ~ country.Region, data = ?, FUN=sum – Geetu Mol Babu Apr 12 '20 at 03:33
  • @GeetuMolBabu - see my updated answer. – Len Greski Apr 12 '20 at 13:56
  • @GeetuMolBabu - note that incoming data is already at country / province unit of analysis, so aggregation is unnecessary. – Len Greski Apr 12 '20 at 14:18