0

I want to create some summary statistics of a data.table, aggregating by month and year of the date column. Here's what I start with:

> head(monthly)
      betnr   persnr idnum frau gebjahr   te_med      month   tentgelt status
1: 50536344 62181514 40442    1    1960 76.52142 1993-12-01  0.5777598   fire
2: 50536344 62744472 40442    0    1963 76.52142 1993-08-01  0.5777598   fire
3: 50536344 63071749 40442    0    1947 76.52142 1993-12-01  0.5777598   fire
4: 50536344 63385685 40442    1    1946 76.52142 1993-07-01  0.5777598   fire
5: 50536344 63918388 40442    0    1952 76.52142 1993-12-01  0.5777598   fire
6: 50536344 61961225 40442    0    1980 71.90094 1994-12-01 23.1001672   fire

To create my statistics, I then run

statistics2 <- monthly[, list(NOBS = .N, MWAGE=mean(tentgelt)), by=list(status, month=format(month, '%m-%Y'))]

This creates the correct statistics, but the month column now contains a string. I try to change the type to date by fixing the days to be 01 always:

x <-apply(statistics2, 1, function(x) paste('01-',x['month'], sep=''))
statistics2[, month:= as.Date(x, '%d-%m-%Y')]

Which gives me the desired output:

> head(statistics2)
   status      month  NOBS      MWAGE
1:   hire 1993-01-01 37914  0.5820961
2: normal 1993-01-01   790  0.5787695
3:   hire 1994-01-01  6471 15.1267445
4: normal 1994-01-01 23931 22.8101928
5:   hire 1993-02-01   435  0.5946736
6: normal 1993-02-01 38661  0.5820226

However, my whole approach feels a bit cloddy. Is there a cleaner way of getting the desired output?

FooBar
  • 15,724
  • 19
  • 82
  • 171
  • Note that Month-Year is _not_ a date and that's probably why R doesn't want to interpret it as such. However, there's some package (which I don't remember at the moment) which has a special Year-Month kind of class. – talat Oct 30 '14 at 22:25

1 Answers1

2

Yes, you can make it simpler and do everything in one go. Just do the whole conversion to Date class in the aggregation process

statistics2 <- monthly[, list(NOBS = .N, 
                         MWAGE = mean(tentgelt)), 
                         by = list(status, month = as.Date(format(month, '%Y-%m-01')))]
statistics2
#    status      month NOBS      MWAGE
# 1:   fire 1993-12-01    3  0.5777598
# 2:   fire 1993-08-01    1  0.5777598
# 3:   fire 1993-07-01    1  0.5777598
# 4:   fire 1994-12-01    1 23.1001672

Some side notes:

  • As @beginner mentioned in his comment, there is no "Year-Month" date type in R, see this r-faq
  • Your apply method is not how you should have done this with data.table. You could accomplish your last step simply by doing:

    statistics2[, month := as.Date(paste0("01-", month), "%d-%m-%Y")]
    
Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196