4

I know that one can pass strings as variable names using the eval(parse()) and as.names() functions. But my problem is a bit different.

I have a string that contains both the data and column name, for example the string: data1$column2. When I try the mentioned commands I get a variable not found error for the variable data1$column2. The variable is itself is of course called data1 and can thus not be found as R interprets the whole string as a variable name.

How do I get the $-sign working as a column reference? Some kind of paste-as-text-command would be great, too. That is, if I just could pass the string as a literal part of my console input.

EXAMPLE

attach(iris)
col_names <- cbind("iris$Sepal.Length", "iris$Sepal.Width")
col_names

Now I want to do:

"as.data.frame(parse(col_names))"

That is, to be interpreted as:

as.data.frame(cbind(iris$Sepal.Length, iris$Sepal.Width))
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Joshua
  • 722
  • 12
  • 27
  • 1
    Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Jul 22 '13 at 16:44
  • Your example is doing it wrong; You just want to subset the object, you certainly *don't* need `as.data.frame(cbind(....))` to do this. I'll update my Answer. – Gavin Simpson Jul 22 '13 at 16:58
  • The reason why I have the column names as strings are because they have been derived through a bit of a longer process. So the key is to get them interpreted as column names indeed. – Joshua Jul 22 '13 at 17:04
  • I appreciate that, but you would not do `as.data.frame(cbind(iris$Sepal.Length, iris$Sepal.Width))` when you really want `iris[, c("Sepal.Length", "Sepal.Width")]`. In other words, think of the problem as i) getting the object name (`iris`) from the input strings, then ii) extract the variable names, then iii) subset the object from i) suing the things derived from ii). – Gavin Simpson Jul 22 '13 at 17:11
  • True, but the data gets aggreagted from along list of variables, thus the string list. – Joshua Jul 22 '13 at 17:23
  • 1
    Again *I know this* but you are asking for code to generate `as.data.frame(cbind(iris$Sepal.Length, iris$Sepal.Width))` where you should be thinking of generating `iris[, c("Sepal.Length", "Sepal.Width")]`. That would have made the problem easier as all you need to extract is the object name and the variable names. Of course, as you've changed the goal posts *yet* again you really do want code that evaluates to `cbind(obj1$Var1, obj2$Var2)`, which is what my Answer now also does with `get4()`. – Gavin Simpson Jul 22 '13 at 17:37

2 Answers2

5

Summary

In light of the various changes to the detail of the question, here are two solutions to the problem that can be phrased as:

Given

col_names <- c("Obj1$Var1", "Obj2$Var2")

how to return a data frame that would be the equivalent of

cbind(Obj1$Var1, Obj2$Var2)

?

The simplest solution would be

as.data.frame(sapply(col_names, function(x) eval(parse(text = x))))

but that uses parse() which shouldn't be relied on for things like this. An alternative, but somewhat longer solution is

get4 <- function(x, ...) {
  fun <- function(text, ...) {
    obj <- get(text[1], ...)
    obj[[text[2]]]
  }
  sx <- strsplit(x, "\\$")
  lx <- lapply(sx, fun, ...)
  out <- do.call(cbind.data.frame, lx)
  names(out) <- x
  out
}

get4(col_names)

The second solution has advantages, despite being somewhat longer, in that it

  1. will work for data of different types as it works with a list and converts that to a data frame. The eval(parse(text = ....)) solution simplifies to an array first. Using lapply() instead of sapply() is an option that gets round this, but needs extra work to change the names of the resulting object.
  2. uses common function get() to grab the object with stated name, and basic subsetting syntax.
  3. doesn't use parse ;-)

Original Answer

The original Answer with greater detail continues below:

eval(parse(....)) will work

data1 <- data.frame(column1 = 1:10, column2 = letters[1:10])
txt <- "data1$column2"

> eval(parse(text = txt))
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j

As @texb mentions, this can trivially be extended to handle a vector of strings via (modified to return a data frame)

col_names <- c("iris$Sepal.Length", "iris$Sepal.Width")
as.data.frame(sapply(col_names, function(x) eval(parse(text = x))))

It may be more acceptable to use get but you'll have to do a bit of precessing, something along the lines of

get2 <- function(x, ...) {
  sx <- strsplit(x, "\\$")[[1]]
  obj <- get(sx[1], ...)
  obj[[sx[2]]]
}

> get2(txt)
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j

iris example from OP's question

As @texb mentions, the eval(parse(text = ....)) version can trivially be extended to handle a vector of strings via (modified to return a data frame)

col_names <- c("iris$Sepal.Length", "iris$Sepal.Width")
as.data.frame(sapply(col_names, function(x) eval(parse(text = x))))

  iris$Sepal.Length iris$Sepal.Width
1               5.1              3.5
2               4.9              3.0
3               4.7              3.2
4               4.6              3.1
5               5.0              3.6
6               5.4              3.9
....

Modifiying get2() is also possible to allow it to work on a vector of strings such as col_names. Here I loop over the first elements of sx to extract the object string (checking that there is only one unique object name), then I get that object and then subset it using the variable names (extracted using sapply(sx, `[`, 2))

get3 <- function(x, ...) {
  sx <- strsplit(x, "\\$")
  obj <- unique(sapply(sx, `[`, 1))
  stopifnot(length(obj) == 1L)
  obj <- get(obj, ...)
  obj[sapply(sx, `[`, 2)]
}

col_names <- c("iris$Sepal.Length", "iris$Sepal.Width")
head(get3(col_names))

> head(get3(col_names))
  Sepal.Length Sepal.Width
1          5.1         3.5
2          4.9         3.0
3          4.7         3.2
4          4.6         3.1
5          5.0         3.6
6          5.4         3.9

If you have multiple objects referenced in col_names then you will need a different solution, along the lines of

get4 <- function(x, ...) {
  fun <- function(text, ...) {
    obj <- get(text[1], ...)
    obj[[text[2]]]
  }
  sx <- strsplit(x, "\\$")
  lx <- lapply(sx, fun, ...)
  out <- do.call(cbind.data.frame, lx)
  names(out) <- x
  out
}

col_names2 <- c("iris$Sepal.Length", "iris2$Sepal.Length")
get4(col_names2)

> head(get4(col_names2))
  iris$Sepal.Length iris2$Sepal.Length
1               5.1                5.1
2               4.9                4.9
3               4.7                4.7
4               4.6                4.6
5               5.0                5.0
6               5.4                5.4
Community
  • 1
  • 1
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thank you! This works, but what if i have more than one column (more than one element) in the txt? It does not seem to attach them to the new data as separate columns. – Joshua Jul 22 '13 at 17:02
  • @Joshua I updated my Answer. It would have helped if you'd explained this more fully when you wrote the original Question. In that original there was no example of needing to work with a vector of strings. Try to be specific when asking Questions, and a reproducible example always helps. My edit should solve the actual problem you have. – Gavin Simpson Jul 22 '13 at 17:12
  • 1
    This weird programming looks like fun - how about `sapply(col_names,function(cname) eval(parse(text=cname)))` (although that seems to assign the verbose version of the column name)? :) – texb Jul 22 '13 at 17:14
  • @texb Well yes, of course - I figured that was trivial given the opening section of my original Answer. However, `eval(parse(...))` is often considered poor form. See for example `library("fortunes")`, then `fortune(106)` ;-) – Gavin Simpson Jul 22 '13 at 17:18
  • You are quite right Gavin, sorry. I do however get a "Error: length(obj) == 1L is not TRUE" error when running it on my actual data. The texb solution seems to work. – Joshua Jul 22 '13 at 17:18
  • @Joshua The error is coming from the fact that you have different objects (bits before the `$`). That is not possible in the subsetting version without a different change, which I'll do. Note that @txb's answer isn't quite what you want - `sapply()` simplifies to an *array* not a data frame... You really do need to frame the question with the exact data in mind, otherwise we end up chasing our tails trying to follow all the updates you keep making to the Question/problem. – Gavin Simpson Jul 22 '13 at 17:21
  • As just noted above, yes. I have data from several variables. I can't think of anything that I should have mentioned now. – Joshua Jul 22 '13 at 17:24
  • Any risks with just using as.data.frame() around that call? My data is just integers. – Joshua Jul 22 '13 at 17:26
  • 1
    @Joshua around `sapply(col_names, function(x) eval(parse(text=x)))`? No, I've added that as an option to my edited Answer. It is OK if your data are atomic, but you will have to be careful with using that idiom elsewhere. My `get4()` above should work fine for different data types as it uses a list and converts that directly to a data frame. – Gavin Simpson Jul 22 '13 at 17:33
3

If you have a variable containing only the column name as a string then you don’t need to eval anything – you simply access the column via foo[[var]] (where var <- 'colname') instead of foo$colname.

If, on the other hand, the whole name is given as a string (this is weird, and should give you pause: change your design, it’s probably broken!) you can still parse out the different parts fairly straightforwardly:

manipulate <- function (vars) {
    parts <- strsplit(vars, '\\$')
    # This gets a list of variables (c('iris', 'iris') in our case)
    data <- lapply(parts, function (part) get(part[1], envir = parent.frame()))
    # This selects the matching column for every variable.
    cols <- mapply(function (d, part) d[part[2]], data, parts)
    # This just `cbind`s the columns.
    do.call(cbind.data.frame, cols)
}

cols <- c('iris$Sepal.Length', 'iris$Sepal.Width')
foo <- manipulate(cols)

That said, if you simply want to select some given columns from a data frame, there’s a much easier way:

cols <- c('Sepal.Length', 'Sepal.Width')
result <- iris[, cols]
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • "the correct way of accessing the column isn't via `foo$colname`" that is patently **not** correct. `$` is a perfectly valid way of accessing a list, which is what a data frame is. Are you over simplifying? Fully agree that storing the object **and** column name separated by `$` is suboptimal. Note your last example is the same as `iris[, cols]`, no need for anything else. Final comment - note the OP has now informed us that he has different objects referenced, e.g. `cols <- c("Obj1$Var1", "Obj2$Var2")` etc which complicates matters slightly. – Gavin Simpson Jul 22 '13 at 17:43
  • @Gavin “If you have … the column name **as a string** …” – furthermore, my code works with different objects. I don’t know what’s up with my last solution, I must have been sleeping or something. – Konrad Rudolph Jul 22 '13 at 17:56
  • 1
    +1 for the `mapply` as that is quite neat. Note that you need `do.call(cbind.data.frame, cols)` as by the time you run that line, `cols` is a *list* with two *vectors* and `cbind` will process them as vectors and hence they have to be atomic. Notice what happens if you do `manipulate(c('iris$Sepal.Length', 'iris$Species'))` - the `Species` variable is converted to the underlying numeric representation of the original factor. You circumvent this by calling the data frame method directly. – Gavin Simpson Jul 22 '13 at 17:58
  • Then it would have been better to say "If you have ... the column name as a string ... then a better way of accessing..." or words to that effect. (Note that `iris$"Species"` is perfectly valid.) – Gavin Simpson Jul 22 '13 at 18:00
  • 1
    @Gavin Thanks for the feedback. Yes, I know that `iris$"Species"` is valid but it no longer works with a string *variable*, obviously. I’ll change the wording. – Konrad Rudolph Jul 22 '13 at 18:01