Create a new data frame column based on the values of two other columns

Question

Let's say I have data frame with two variables and 213005 observations, it looks like that:

df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205), 
        date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02", "2012/01/01", "2012/01/01", "2012/01/05"))

I need to create a new column called "new" for each different "nr" value according to "date" value, it should look like this:

df <- data.frame(nr=c(233, 233, 232, 231, 234, 234, 205), 
        date=c("2012/01/02", "2012/01/01", "2012/01/01", "2012/01/02", 
                  "2012/01/01", "2012/01/01", "2012/01/05"), 
        new=c(1, 2, 3, 4, 5, 5, 6))

(nr=233, date=2012/01/02) => (new=1)

(nr=233, date=2012/01/01) => (new=2) ...

for (nr=234, date=2012/01/01) there should be two the same columns with new=5, repeated lines should stay in data frame.

Does anyone knows how to do that? Any help would be very appreciated! Thank you!

possible duplicate of [How to Index subjects using R](http://stackoverflow.com/questions/28841552/how-to-index-subjects-using-r) — , Mar 04 '15 at 11:15
@Nemo the linked "dupe" has nothing to do with this question. — David Arenburg, Mar 04 '15 at 11:21
@Nemo A possible duplicate would be http://stackoverflow.com/questions/13018696/data-table-key-indices-or-group-counter but again the question in the link was a bit specific to data.table, so I don't know if I can close this as duplicate — akrun, Mar 04 '15 at 11:29
@akrun your choose to close it or not, I just search and learn :-) thanks for your amazing help you provide here akrun :-) — , Mar 04 '15 at 11:32
@Nemo No problem. I will leave it open in case we get a better link — akrun, Mar 04 '15 at 11:34

score 4 · Answer 1 · answered Mar 04 '15 at 09:30

4

I'm not entirely sure I understand the logic, but it seems like you want to group by both columns, here's a simple data.table solution using .GRP

library(data.table)
setDT(df)[, new := .GRP, .(nr, date)][]
#     nr       date new
# 1: 233 2012/01/02   1
# 2: 233 2012/01/01   2
# 3: 232 2012/01/01   3
# 4: 231 2012/01/02   4
# 5: 234 2012/01/01   5
# 6: 234 2012/01/01   5
# 7: 205 2012/01/05   6

answered Mar 04 '15 at 09:30

David Arenburg

91,361
17
137
196

Thank you for the help. Yes, I want to group by both columns. I'm trying to do as you said, but I get an `error: Type of RHS ('integer') must match LHS ('double')`... In my data frame `nr` is `integer` and `date` is `factor` maybe that is the problem? – Miglė Papuškaitė Mar 04 '15 at 09:39
Remove the `new` column you've created by hand and run this code when you *don't* have that column in your data. Alternatively, you can just create another column and call it `new2` for example. – David Arenburg Mar 04 '15 at 09:40
@David Arenburg funny, you alway say duplicated while you make duplicated yourself. http://stackoverflow.com/questions/28841552/how-to-index-subjects-using-r/28841703#28841703 – Mar 04 '15 at 11:14

score 1 · Answer 2 · answered Mar 04 '15 at 09:49

1

Using base R,

 v1 <- do.call(paste, df)
 df$new <- as.numeric(factor(v1, levels=unique(v1)))

answered Mar 04 '15 at 09:49

akrun

874,273
37
540
662

Create a new data frame column based on the values of two other columns

2 Answers2

Related