1

I have a factor variable in a data frame of the form 735-739.

I want to add this as three numeric columns (min, mean, max) to my data frame.

I'm starting by using strsplit:

values = sapply(range, function(r) {
    values = c(strsplit(as.character(r), "-"))
})

I get back a value of class list of length 1:

[1] "735" "739"

I'm at a loss on what my next step should be. I'd appreciate a hint.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • 2
    What does your data frame look like? A [reproducible example](http://stackoverflow.com/q/5963269/) would help. `dput(head(...))` – Blue Magister Nov 10 '13 at 01:56

2 Answers2

1

There are several ways you can do this. Here is one starting with concat.split.multiple from my "splitstackshape" package:

## SAMPLE DATA
mydf <- data.frame(ID = LETTERS[1:3], vals = c("700-800", "600-750", "100-220"))
mydf
#   ID    vals
# 1  A 700-800
# 2  B 600-750
# 3  C 100-220

First, split the "vals" column, rename them if required (using setnames), and add a new column with the rowMeans.

library(splitstackshape)

mydf <- concat.split.multiple(mydf, "vals", "-")
setnames(mydf, c("vals_1", "vals_2"), c("min", "max"))
mydf$mean <- rowMeans(mydf[c("min", "max")])
mydf
#   ID min max mean
# 1  A 700 800  750
# 2  B 600 750  675
# 3  C 100 220  160

For reference, here's a more "by-hand" approach:

mydf <- data.frame(ID = LETTERS[1:3], vals = c("700-800", "600-750", "100-220"))
SplitVals <- sapply(sapply(mydf$vals, function(x) 
  strsplit(as.character(x), "-")), function(x) {
    x <- as.numeric(x)
    c(min = x[1], mean = mean(x), max = x[2])
  })
cbind(mydf, t(SplitVals))
#   ID    vals min mean max
# 1  A 700-800 700  750 800
# 2  B 600-750 600  675 750
# 3  C 100-220 100  160 220
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

Using @AnandraMahto's dataset, you could also use the data.table library -

library(data.table)
dt <- data.table(ID = LETTERS[1:3], vals = c("700-800", "600-750", "100-220"))

# adding the min and max columns
splitlist <- strsplit(dt[,vals],"-")
dt[, minv := as.numeric(sapply(X = splitlist, function(x) x[1]))]
dt[, maxv := as.numeric(sapply(X = splitlist, function(x) x[2]))]

#adding mean
dt[,meanv := mean(minv:maxv), by = "vals"]
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
  • I don't like that `minv` and `maxv` are character. Also you could compound this: `dt[, c("Min", "Max", "Mean") := list(sapply(splitlist, `[[`, 1), sapply(splitlist, `[[`, 2), sapply(splitlist, function(x) mean(as.numeric(x))))][]` – A5C1D2H2I1M1N2O1R2T1 Nov 10 '13 at 07:56
  • Aye, to the first part about not having characters. The second part, I kept it simple so that it's easier to understand. I think the only extra cost would be traversing through `splitlist` one extra time this way, but I'm not sure. – TheComeOnMan Nov 10 '13 at 10:57