3

Let's say I have data frame with two variables and 213005 observations, it looks like that:

df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205), 
        date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02", "2012/01/01", "2012/01/01", "2012/01/05"))

I need to create a new column called "new" for each different "nr" value according to "date" value, it should look like this:

df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205), 
        date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02", 
                  "2012/01/01", "2012/01/01", "2012/01/05"), 
        new=c(1, 2, 3, 4, 5, 5, 6))

(nr=233, date=2012/01/02) => (new=1)

(nr=233, date=2012/01/01) => (new=2) ...

for (nr=234, date=2012/01/01) there should be two the same columns with new=5, repeated lines should stay in data frame.

Does anyone knows how to do that? Any help would be very appreciated! Thank you!

  • possible duplicate of [How to Index subjects using R](http://stackoverflow.com/questions/28841552/how-to-index-subjects-using-r) –  Mar 04 '15 at 11:15
  • @Nemo the linked "dupe" has nothing to do with this question. – David Arenburg Mar 04 '15 at 11:21
  • @Nemo A possible duplicate would be http://stackoverflow.com/questions/13018696/data-table-key-indices-or-group-counter but again the question in the link was a bit specific to data.table, so I don't know if I can close this as duplicate – akrun Mar 04 '15 at 11:29
  • @akrun your choose to close it or not, I just search and learn :-) thanks for your amazing help you provide here akrun :-) –  Mar 04 '15 at 11:32
  • 1
    @Nemo No problem. I will leave it open in case we get a better link – akrun Mar 04 '15 at 11:34

2 Answers2

4

I'm not entirely sure I understand the logic, but it seems like you want to group by both columns, here's a simple data.table solution using .GRP

library(data.table)
setDT(df)[, new := .GRP, .(nr, date)][]
#     nr       date new
# 1: 233 2012/01/02   1
# 2: 233 2012/01/01   2
# 3: 232 2012/01/01   3
# 4: 231 2012/01/02   4
# 5: 234 2012/01/01   5
# 6: 234 2012/01/01   5
# 7: 205 2012/01/05   6
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Thank you for the help. Yes, I want to group by both columns. I'm trying to do as you said, but I get an `error: Type of RHS ('integer') must match LHS ('double')`... In my data frame `nr` is `integer` and `date` is `factor` maybe that is the problem? – Miglė Papuškaitė Mar 04 '15 at 09:39
  • Remove the `new` column you've created by hand and run this code when you *don't* have that column in your data. Alternatively, you can just create another column and call it `new2` for example. – David Arenburg Mar 04 '15 at 09:40
  • @David Arenburg funny, you alway say duplicated while you make duplicated yourself. http://stackoverflow.com/questions/28841552/how-to-index-subjects-using-r/28841703#28841703 –  Mar 04 '15 at 11:14
1

Using base R,

 v1 <- do.call(paste, df)
 df$new <- as.numeric(factor(v1, levels=unique(v1)))
akrun
  • 874,273
  • 37
  • 540
  • 662