37

In R I find myself doing something like this a lot:

adataframe[adataframe$col==something]<-adataframe[adataframe$col==something)]+1

This way is kind of long and tedious. Is there some way for me
to reference the object I am trying to change such as

adataframe[adataframe$col==something]<-$self+1 

?

IRTFM
  • 258,963
  • 21
  • 364
  • 487
LostLin
  • 7,762
  • 12
  • 51
  • 73

4 Answers4

34

Try package data.table and its := operator. It's very fast and very short.

DT[col1==something, col2:=col3+1]

The first part col1==something is the subset. You can put anything here and use the column names as if they are variables; i.e., no need to use $. Then the second part col2:=col3+1 assigns the RHS to the LHS within that subset, where the column names can be assigned to as if they are variables. := is assignment by reference. No copies of any object are taken, so is faster than <-, =, within and transform.

Also, soon to be implemented in v1.8.1, one end goal of j's syntax allowing := in j like that is combining it with by, see question: when should I use the := operator in data.table.

UDPDATE : That was indeed released (:= by group) in July 2012.

Community
  • 1
  • 1
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • And probably much more rigorously tested than my offerings. – IRTFM Oct 14 '11 at 15:31
  • package can change the basic language operators??? I thought package is just a bunch of functions/data! – Tomas Oct 14 '11 at 19:49
  • question: you can use just `DT[col1...]` instead of `DT[DT$col1...]`? – Tomas Oct 14 '11 at 19:49
  • @TomasT.: you can. Notice that if col1 is indexed, `DT[col1="a",]` performs a vector scan, while `DT["a",]` performs a binary search. – Ryogi Oct 14 '11 at 21:39
  • 3
    @tomas-t `:=` isn't a basic language operator. It's unused and undefined by R, but available to be defined. – Matt Dowle Oct 15 '11 at 07:37
  • @Matthew, I know it isn't basic language operator. I wanted to say: "how is it possible that package can change the set of language operators?" – Tomas Oct 15 '11 at 08:17
  • @Tomas : In a very real sense the data.table package does redefine the "[" operator, and this is allowed by the language which expects generic methods to be dispatched on the basis of the object class. Furthermore there is a mechanism for creating infix operators: `?"%in%` – IRTFM Jun 12 '12 at 17:14
15

You should be paying more attention to Gabor Grothendeick (and not just in this instance.) The cited inc function on Matt Asher's blog does all of what you are asking:

(And the obvious extension works as well.)

add <- function(x, inc=1) {
   eval.parent(substitute(x <- x + inc))
 }
# Testing the `inc` function behavior

EDIT: After my temporary annoyance at the lack of approval in the first comment, I took the challenge of adding yet a further function argument. Supplied with one argument of a portion of a dataframe, it would still increment the range of values by one. Up to this point has only been very lightly tested on infix dyadic operators, but I see no reason it wouldn't work with any function which accepts only two arguments:

transfn <- function(x, func="+", inc=1) {
   eval.parent(substitute(x <- do.call(func, list(x , inc)))) }

(Guilty admission: This somehow "feels wrong" from the traditional R perspective of returning values for assignment.) The earlier testing on the inc function is below:

df <- data.frame(a1 =1:10, a2=21:30, b=1:2)
 inc <- function(x) {
   eval.parent(substitute(x <- x + 1))
 }

#---- examples===============>

> inc(df$a1)  # works on whole columns
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   6 25 1
6   7 26 2
7   8 27 1
8   9 28 2
9  10 29 1
10 11 30 2
> inc(df$a1[df$a1>5]) # testing on a restricted range of one column
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   7 25 1
6   8 26 2
7   9 27 1
8  10 28 2
9  11 29 1
10 12 30 2

> inc(df[ df$a1>5, ])  #testing on a range of rows for all columns being transformed
> df
   a1 a2 b
1   2 21 1
2   3 22 2
3   4 23 1
4   5 24 2
5   8 26 2
6   9 27 3
7  10 28 2
8  11 29 3
9  12 30 2
10 13 31 3
# and even in selected rows and grepped names of columns meeting a criterion
> inc(df[ df$a1 <= 3, grep("a", names(df)) ])
> df
   a1 a2 b
1   3 22 1
2   4 23 2
3   4 23 1
4   5 24 2
5   8 26 2
6   9 27 3
7  10 28 2
8  11 29 3
9  12 30 2
10 13 31 3
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    This is awesome if I'm performing the same operation multiple times but if I just need to perform the operation once defining the inc function isn't worth the time – LostLin Oct 14 '11 at 14:49
  • 10
    Some people just cannot be satisfied. – IRTFM Oct 14 '11 at 14:56
  • 1
    lol well if this is the best I can get I'll take it. I'm just curious to see if theres an easier solution out there since I'm not too familiar with all the existing R funtions – LostLin Oct 14 '11 at 15:07
  • 1
    It's possible that even that `transfn` could be generalized with the use of "..." argument list. An appropriate area of investigation given your user name. – IRTFM Oct 14 '11 at 15:16
6

Here is what you can do. Let us say you have a dataframe

df = data.frame(x = 1:10, y = rnorm(10))

And you want to increment all the y by 1. You can do this easily by using transform

df = transform(df, y = y + 1)
Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • also `df <- within(df, y<- y+1)`, which is more general than `transform`. – hatmatrix Oct 14 '11 at 14:13
  • This would still be kind of annoying to do over a subset of something though. i.e. if i wanted to only increase y values less than 1 i would still have to write `df=transfrom(df, y[y<1]<-y[y<1]+1` – LostLin Oct 14 '11 at 14:24
6

I'd be partial to (presumably the subset is on rows)

ridx <- adataframe$col==something
adataframe[ridx,] <- adataframe[ridx,] + 1

which doesn't rely on any fancy / fragile parsing, is reasonably expressive about the operation being performed, and is not too verbose. Also tends to break lines into nicely human-parse-able units, and there is something appealing about using standard idioms -- R's vocabulary and idiosyncrasies are already large enough for my taste.

Martin Morgan
  • 45,935
  • 7
  • 84
  • 112