0

I need to arrange a table by performing some formatting, with a table like

DT <- read.table(text = 
"Year   ST_ID    N   Overall  Metric1  Metric2
1999    205    386     96.3        0       0    
1999    205     15        0        0       0
1999    205      0        0        0       0  
1999    205      0        0        0      NA
2000    205    440      100        0       0
2000    205      0        0        0       0
2000    205      0        0       NA       0
2000    205      0        0        0      NA", header = TRUE)

I need to obtain the following "Output" table.

Year   ST_ID    1   2  3  4  Overall  Metric1  Metric2
1999    205   386  15  0  0     96.3        0      NA    
2000    205   440   0  0  0      100       NA      NA
 .
 .

In the columns on the right, I want to aggregate any instances of NA => to NA else sum(values)

How can I achieve this using R?

Jaap
  • 81,064
  • 34
  • 182
  • 193
user3408139
  • 197
  • 1
  • 12

2 Answers2

1

With the data.table package, there's

library(data.table)
setDT(DT)

DT[, c(
  setNames(as.list(N), seq_along(N)),
  lapply(.SD, sum)
), by=.(Year, ST_ID), .SDcols = Overall:Metric2]

#    Year ST_ID   1  2 3 4 Overall Metric1 Metric2
# 1: 1999   205 386 15 0 0    96.3       0      NA
# 2: 2000   205 440  0 0 0   100.0      NA      NA

How it works

See the package vignettes for an introduction. In this case, the syntax is DT[, j, by=, .SDcols=].

  • j is a list, with each element corresponding to a column in the result.
  • .SDcols= optionally allows for defining a set of columns of DT.
  • by= optionally allows for computing j separately per group.
Frank
  • 66,179
  • 8
  • 96
  • 180
  • actually the `sum` function does the `if else` statement that you have. you can simply have it as `lapply(.SD,sum)` – Onyambu Aug 18 '17 at 07:49
1
 library(tidyr)   
 A=aggregate(.~Year,xy[-(2:3)],sum,na.action=function(x)x)
 xy[1:3]%>%group_by(Year)%>%mutate(N_=1:4)%>%spread(N_,N,sep="")%>%merge(A,by="Year")
  Year ST_ID N_1 N_2 N_3 N_4 Overall Metric1 Metric2
1 1999   205 386  15   0   0    96.3       0      NA
2 2000   205 440   0   0   0   100.0      NA      NA

Hope this helps.

I think there can be a better way to aggregate. eg the na.action should take an Identity function(a function that returns the input) such as the I function in base R. although the I function changes the class of the object to AsIs and this is a challenge to me within the aggregate function. class(I(xy)).

Onyambu
  • 67,392
  • 3
  • 24
  • 53