Mutate/replace in one go

Question

MAJOR EDIT

Consider a simple data frame:

    df = data.frame(obs.no = 1:10, conc = rnorm(10))
    discard.obs.no = 1:5

I want this:

    df[df$obs.no %in% discard.obs.no,"conc"] = df[df$obs.no %in% discard.obs.no,"conc"]

To be done using a helper function like that:

    change(df[df$obs.no %in% discard.obs.no,"conc"], function(x) 2^x)

Essentially I want to avoid retyping the LHS on RHS of the assignment operator. Why? Because the whole thing becomes unwieldy with complicated filtering.

As the example suggests, the function should change only the filtered data, not return the subset. It should also happen in the background i.e. without reassignment to the original data.frame.

Mutate/transform/within etc. do not do the job, since they print out to the console, necessitating reassignment. Assign does not take parts of data.frames as an argument. Whole thing is a bit of vanity project, but I'm sure there's a viz out there who can do it (:

BONUS: try writing a parser that would shorten it even further to:

    change(2^df[df$obs.no %in% 1:5,"conc"])

I.e. figure out which part is the object to be reassigned - left/right of $ or left of [ and between [].

It's really unclear what you're trying to do. Are you trying to dynamically write R expressions without evaluating them? — Thomas, Aug 21 '14 at 08:46
A reproducible dataset with desired outcomes would be good to have here. — Henk, Aug 21 '14 at 10:33
Very similar to this +32 and *15 question: http://stackoverflow.com/questions/7768686/r-self-reference — Matt Dowle, Aug 21 '14 at 17:59
Good point Matt, just didn't think of the phrase "self-reference" when researching the question. — loard, Aug 21 '14 at 18:46

score 2 · Accepted Answer · edited Aug 21 '14 at 17:16

2

What you're asking for is not supported in base R. Or, rather, it could be but you're asking for pass-by-reference semantics, which violate R's sort of core "functional" programming style. Achieving it will require some hackery.

So, you can achieve this by using data.table:

set.seed(1)
library("data.table")
dt <- data.table(obs.no = 1:10, conc = rnorm(10))
dt[obs.no %in% discard.obs.no, conc2 := 2^conc]
dt
    obs.no       conc     conc2
 1:      1 -0.6264538 0.6477667
 2:      2  0.1836433 1.1357484
 3:      3 -0.8356286 0.5603388
 4:      4  1.5952808 3.0215332
 5:      5  0.3295078 1.2565846
 6:      6 -0.8204684        NA
 7:      7  0.4874291        NA
 8:      8  0.7383247        NA
 9:      9  0.5757814        NA
10:     10 -0.3053884        NA

I show conc2 := 2^conc here, as an example, you could also store back into the conc variable itself using analogous notation.

edited Aug 21 '14 at 17:16

Arun

116,683
26
284
387

answered Aug 21 '14 at 17:11

Thomas

43,637
12
109
140

R's core "functional" programming style is very much debatable. – Arun Aug 21 '14 at 17:18
@Arun Probably true. I'm glad data.table is there, but for someone just starting it's pretty atypical of R's base tools. – Thomas Aug 21 '14 at 17:35
Brilliant, wasn't aware of the data.table package, sounds like something that should be base (but then so do plyr, reshape and ggplot). – loard Aug 21 '14 at 18:47

score 0 · Answer 2 · answered Aug 21 '14 at 08:52

0

Not entirely sure what you are after but the dplyr package will do what you want to do (I think). In the example below the select command is not needed but you mention the column corr in your question, so I thought it might help give you an idea of what you could do.

# Load the dplyr package
library(dplyr)
# create an index of values to discard
discard.obs.no <- 1:5
df <- data.frame(conc = rnorm(10), obs.no = 1:10)
modified <- df %>%
    # Select the columns you want to use by names
    select(obs.no, conc) %>%
    # use a logical statement to subset the rows you want to use
    filter(!(obs.no %in% discard.obs.no)) %>%
    # Provide a function to manipulate the data
    mutate(changed_conc = 2^conc)

answered Aug 21 '14 at 08:52

Jase_

1,186
9
12

It makes it clearer, but returns only a subset and requires reassignment. – loard Aug 21 '14 at 12:37
1

I don't get your beef with assignment. You want `change(df,something)` to magically change `df`, which is a side-effect and not a good thing, instead of typing `df=change(df, something)` which is only three more keypresses and is explicit in what it does. – Spacedman Aug 21 '14 at 17:54
Or 41 keystrokes, if you look at my example (or rather ctrl-c ctrl-v and a very long line). I guess it's more when I'm munging or exploring data than seriously writing code - then I'd perhaps be better off avoiding mutation completely. – loard Aug 21 '14 at 18:44

Mutate/replace in one go

2 Answers2