r - How to add row index to a data frame, based on combination of factors

Question

I have a data frame like this:

df <- data.frame(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

  Dim1 Dim2 Value
1    A  100     3
2    A  100     6
3    A  100     7
4    A  100     4
5    A  200     8
6    A  200     9
7    B  100     2
8    B  200    10

(The Value column is just to illustrate that each row is a data point; the actual value doesn't matter.) Ultimately what I would like to do is plot the values against their index within the subset defined by Dim1 and Dim2. For this reason, I think need to append a new column containing the indices, which would look like this (added blank lines between rows to make it obvious what the subsets are):

  Dim1 Dim2 Value Index
1    A  100     1     1
2    A  100     9     2
3    A  100     4     3
4    A  100    10     4

5    A  200     7     1
6    A  200     3     2

7    B  100     5     1

8    B  200     8     1

How do I do this elegantly in R? I'm coming from Python and my default approach is to for-loop over the combinations of Dim1 & Dim2, keeping track of the number of rows in each and assigning the maximum encountered so far to each row. I've been trying to figure it out but my vector-fu is weak.

Is it what you are trying to do ? `df$index <- c(1,2,3,4,1,2,1,1)` — Jd Baba, Apr 18 '13 at 20:22
@Jdbaba In this particular example, yes. Generally, no, since I need an abstract function that will work with a larger data.frame with more factor variables, etc. — jsavn, Apr 19 '13 at 02:08
Since this was successfully answered, is there any way the title could be more informative? To me knowing how to do this is of very basic importance and I'd like people to be able to find it. — jsavn, Apr 19 '13 at 03:41

IRTFM · Accepted Answer · 2013-04-18T20:34:03.330

This is probably going to look like cheating since I am passing a vector into a function which I then totally ignore except to get its length:

 df$Index <- ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=function(x) 1:length(x) )

The ave function returns a vector of the same length as its first argument but computed within categories defined by all of the factors between the first argument and the argument named FUN. (I often forget to put the "FUN=" in for my function and get a cryptic error message along the lines of unique() applies only to vectors, since it was trying to determine how many unique values an anonymous function possesses and it fails.

There's actually another even more compact way of expressing function(x) 1:length(x) using the seq_along function whch is probably safer since it would fail properly if passed a vector of length zero whereas the anonymous function form would fail improperly by returning 1:0 instead of numeric(0):

ave( 1:nrow(df), df$Dim1, factor( df$Dim2), FUN=seq_along )

eddi · Answer 2 · 2013-04-19T00:17:30.387

4

Here you go, using data.table:

library(data.table)
df <- data.table(
    Dim1 = c("A","A","A","A","A","A","B","B"),
    Dim2 = c(100,100,100,100,200,200,100,200),
    Value = sample(1:10, 8)
        )

df[, index := seq_len(.N), by = list(Dim1, Dim2)]

edited Apr 19 '13 at 00:17

answered Apr 18 '13 at 20:26

eddi

49,088
6
104
155

use `seq_len(.N)` instead of `1:.N` (while in this case .N will always be 1 or greater, seq_len is faster, and safer ingeneral) – mnel Apr 19 '13 at 00:11
@eddi Thanks, this also does what I need! I think for now I prefer the solution posted above because it works with data.frames and I'm totally unfamiliar with data.tables. – jsavn Apr 19 '13 at 02:12

score 0 · Answer 3 · answered Apr 18 '13 at 20:27

0

Is this what you are trying to achieve ?

library(ggplot2)
df <- data.frame(
  Dim1 = c("A","A","A","A","A","A","B","B"),
  Dim2 = c(100,100,100,100,200,200,100,200),
  Value = sample(1:10, 8)
)
df$index <- c(1,2,3,4,1,2,1,1)

ggplot(df,aes(x=index,y=Value))+geom_point()+facet_wrap(Dim1~Dim2)

The output is as follows: enter image description here

answered Apr 18 '13 at 20:27

Jd Baba

5,948
18
62
96

Ultimately, yes! Except I'm comfortable with ggplot2, but I don't know how to make a function that sorts out the Index column automatically. – jsavn Apr 19 '13 at 01:21

r - How to add row index to a data frame, based on combination of factors

3 Answers3

Linked