change code because of unwanted factors

Question

So basically my code above simply takes every 5th number and calculates the standard deviation of the values for every 5th number....So if I have a sample data like this

Number  STD
1   11.15
2   11.18
3   11.21
4   11.24
5   11.3
10  11.36
11  11.42
12  11.48
13  11.54
14  11.6
15  11.66
16  11.72
17  11.78
18  11.84
19  11.9
20  11.96

When I run my code, I'll get this output

 Number        STD
1      1 0.05770615
2      2         NA
3      3 0.09486833
4      4 0.09486833

So what I want to do is simple replace the NA with 0. Also instead of getting factors like 1,2,3,4 etc...I want to get 5,10,15,20,25 etc....

I am a bit puzzled as to what you want to achieve... could you be a bit more clear? Why is the second standard deviation NA? There are no missing points in your data... — nico, Mar 30 '13 at 14:10
regarding the point of replacing the `NA`s with `0`s, http://stackoverflow.com/questions/13615385/referencing-a-dataframe-recursively/13620608#13620608 is probably helpful — Ricardo Saporta, Mar 30 '13 at 14:53

agstudy · Answer 1 · 2013-03-30T14:27:01.613

I haven't tried to rewrite what you try to do , but just for the sake of continuity you can

You can use argument labels of cut to set labels resulting category.
Change NA to 0 using spread[is.na(spread)] <- 0

The all code is :

hunter <- lapply(hunt, function(i) {  
  random <- cut(value[,i],seq(0,max(value[i]),5),
                labels=seq(5,max(value[i]),5))
  spread<-tapply(value[,i+1],random, sd,na.rm=TRUE)
  spread[is.na(spread)] <- 0
  Number<-levels(as.factor(random))
  d <- data.frame(Number=Number,STD=spread)
  })

  Number        STD
5       5 0.05770615
10     10 0.00000000
15     15 0.09486833
20     20 0.09486833

score 1 · Accepted Answer · answered Mar 30 '13 at 14:25

Another way of doing it:

# Generate data
number <- c(1:5, 10:20)
val <- c(11.15, 11.18, 11.21, 11.24, 11.30, 11.36, 11.42,
  11.48, 11.54, 11.60, 11.66, 11.72, 11.78, 11.84, 11.90, 11.96)

data <- data.frame(number, val)


# Calculate SD
breaks <- seq(0, 20, 5)
splitted.data <- split(data$val, f=cut(data$number, breaks, labels=F))
err <- sapply(splitted.data, sd)
err[is.na(err)] <- 0
res <- cbind(Number = breaks[-1], STD = err)

Resulting in:

> res
  Number        STD
1      5 0.05770615
2     10 0.00000000
3     15 0.09486833
4     20 0.09486833

score 1 · Answer 3 · answered Mar 30 '13 at 15:27

Using the data.table package, you can accomplish this in one call:

 library(data.table)
 DT <- data.table(value)

As a sigle call:

DT[, list(SD = ifelse(is.na(sd(STD)), 0, sd(STD))) 
   , by=list("Group" = factor(G <- (Number-1) %/% 5, labels=(unique(G) + 1)*5))]

   Group         SD
1:     5 0.05770615
2:    10 0.00000000
3:    15 0.09486833
4:    20 0.09486833

Breaking it down:

# you can create your groupings by 
(Number-1) %/% 5  # (ie, the remainder when divided by 5)

# you can create your factor levels by 
5 * ((Number-1) %/% 5 + 1)

# calculate the Group:
DT[, grp := factor(G <- (Number-1) %/% 5, labels=(unique(G) + 1)*5)]

# calculate the SD by Group, replacing NA's with 0:
DT[, SD := ifelse(is.na(sd(STD)), 0, sd(STD)), by=grp]
unique(DT[, list(grp, SD)])

change code because of unwanted factors

3 Answers3