0

The following is a data example,

Month        Year     Tornado    Location
January      1998     3         Illinois
February     1998     2         Illinois
March        1998     5         Illinois
January      1998     1         Florida
January      2010     3         Illinois

Here is what I want it to look like essentially,

Date      Tornado
1998-01   4
1998-02   2
1998-03   5
2010-01   3

So, I want to combine the Year and Month into one, new column. The locations do not matter, I want to know the total number of tornadoes for January, 1998, and etc. I have the following code, but do not know how to change it to incorporate both the variables I want, or if this is even the correct code for what I am attempting to do.

mydata$Date <- format(as.Date(mydata$month), "%m-%Y")

The real dataset is far too large to fix manually. I am basically attempting to make this data into time series data.

Fire
  • 301
  • 1
  • 2
  • 9

3 Answers3

0

You need to apply some data transformation before applying How to sum a variable by group

aggregate(Tornado~Date, transform(df, Date = format(as.Date(paste(Month,Year,"01"),
                        "%B %Y %d"), "%Y-%m")), sum)

#     Date Tornado
#1 1998-01       4
#2 1998-02       2
#3 1998-03       5
#4 2010-01       3

data

df <- structure(list(Month = structure(c(2L, 1L, 3L, 2L, 2L), 
.Label = c("February", "January", "March"), class = "factor"), 
Year = c(1998L, 1998L,1998L, 1998L, 2010L), 
Tornado = c(3L, 2L, 5L, 1L, 3L), Location = structure(c(2L, 
2L, 2L, 1L, 2L), .Label = c("Florida", "Illinois"), class = "factor")), 
class = "data.frame", row.names = c(NA, -5L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I am getting an error that says "no rows to aggregate", any idea why that might be? – Fire Dec 01 '19 at 08:21
  • @CynicalF Not sure. Can you share data using `dput` ? `dput(df)` ? – Ronak Shah Dec 01 '19 at 08:34
  • Unfortunately, this is not plausible with what I'm trying to do as the data has over 6,000 data points. I did change everything to factors, but I am still receiving the same error. – Fire Dec 01 '19 at 18:54
  • @CynicalF You can always share only sample of the data by doing `dput(head(df))`. Also can you run this on the data which I have in my answer? Does it give you the expected output ? – Ronak Shah Dec 01 '19 at 22:32
0

In the first place, I combined Month and Year into a single variable called Date, applied the appropriate format with zoo package, and grouped the results by Date.

library(tidyverse)
library(zoo)


df %>% 
  unite(Date, Month, Year) %>%
  mutate(Date = as.yearmon(Date, format = '%B_%Y')) %>%
  group_by(Date) %>%
  summarise(Tornado = sum(Tornado))

# A tibble: 4 x 2
  Date      Tornado
  <yearmon>   <int>
1 Jan 1998        4
2 Feb 1998        2
3 Mar 1998        5
4 Jan 2010        3
AlexB
  • 3,061
  • 2
  • 17
  • 19
  • Unfortunately, tidyverse failed to load properly in my R multiple times, so I will be unable to use the unite function. – Fire Dec 01 '19 at 18:44
  • The unite function is coming from tidyr package, which is part of tidyverse framework. Try to load it separately. – AlexB Dec 01 '19 at 18:48
  • Alright, I was able to load tidyr correctly, but now I receive a different error message, " no applicable method for 'unite_' applied to an object of class "function" " – Fire Dec 01 '19 at 18:56
  • It may be a conflict with another package. Try to specify the package when you run the code like this tidyr::unite. If this is not working, replace the line unite(Date, Month, Year) with mutate(Date = paste0(Month, '_', Year)). Hope you can make it. – AlexB Dec 01 '19 at 19:13
0

if the day doesn't matter you can do:

#library (tidyverse)
library(lubridate)

x$Date<-as_date(paste0(x$Year,x$Month,"-01"))

# A tibble: 5 x 4
  Month     Year Tornados Date      
  <chr>    <dbl>    <dbl> <date>    
1 January   1998        3 1998-01-01
2 February  1998        2 1998-02-01
3 March     1998        5 1998-03-01
4 January   1998        1 1998-01-01
5 January   2010        3 2010-01-01
D.J
  • 1,180
  • 1
  • 8
  • 17