Add column to Data Frame based on values of other columns

Question

My dataframe DF looks like:

ID    Name1    Name2    Group    
1234  A1       x
1234  A4       w
1234  A3       q
1234  A        A
1234  A2       z
5678  B3       s
5678  B        B
...

I need to add a column for Group that is Name1 for each ID that matches the ID in the row where Name1 == Name2.

So the logic would be to check if Name1 == Name2, remember the ID of that Row and the Name1 value, then for every row having that ID, put the Name1 Value in each row of the Group column.

The result should look like:

ID    Name1    Name2    Group    
1234  A1       x        A
1234  A4       w        A
1234  A3       q        A
1234  A        A        A
1234  A2       z        A
5678  B3       s        B
5678  B        B        B
...

I am not sure how to do this in the dataframe though and from Many rows with different IDs. I dont want to use loops.

mutate() or lapply() maybe?

I can see how to add the Name1 value in the Group column for the rows where Name1==Name2, but how do I roll that back up for all matching IDs?

score 3 · Accepted Answer · answered Sep 26 '14 at 21:36

you can do it in a single line, using data.table

DT[, Group := Name1[Name1 == Name2], by=ID]

Full details:

library(data.table)

DT <- as.data.table(DF)

DT[, Group := Name1[Name1 == Name2], by=ID]

     ID Name1 Name2 Group
1: 1234    A1     x     A
2: 1234    A4     w     A
3: 1234    A3     q     A
4: 1234     A     A     A
5: 1234    A2     z     A
6: 5678    B3     s     B
7: 5678     B     B     B
8: 1589     C     x    NA
9: 1589     C     y    NA



## if `Name1`, `Name2`  are NOT characters, use 
DT[, Name1 := as.character(Name1)]
DT[, Name2 := as.character(Name2)]

score 1 · Answer 2 · answered Sep 26 '14 at 19:54

Give this a try

x <- merge(x, x[x$Name1 == x$Name2, 1:2], by.x = "ID", by.y = "ID")
names(x)[4] <- "Group"
#     ID Name1.x Name2   Group
# 1 1234      A1     x       A
# 2 1234      A4     w       A
# 3 1234      A3     q       A
# 4 1234       A     A       A
# 5 1234      A2     z       A
# 6 5678      B3     s       B
# 7 5678       B     B       B

Rich Scriven · Answer 3 · 2014-09-26T20:25:17.490

One other possibility:

unsplit(lapply(split(df, df$ID), function(x) {
    x$Group <- if(any(y <- x$Name1 %in% x$Name2)) x$Name2[y] else NA
    x
}), df$ID)

     ID Name1 Name2 Group
1  1234    A1     x     A
2  1234    A4     w     A
3  1234    A3     q     A
4  1234     A     A     A
5  1234    A2     z     A
6  5678    B3     s     B
7  5678     B     B     B
9  1589     C     x  <NA>
10 1589     C     y  <NA>

Data

df <- 
structure(list(ID = c("1234", "1234", "1234", "1234", "1234", 
"5678", "5678", "1589", "1589"), Name1 = structure(c(2L, 5L, 
4L, 1L, 3L, 7L, 6L, 8L, 8L), .Label = c("A", "A1", "A2", "A3", 
"A4", "B", "B3", "C"), class = "factor"), Name2 = structure(c(6L, 
5L, 3L, 1L, 7L, 4L, 2L, 6L, 8L), .Label = c("A", "B", "q", "s", 
"w", "x", "z", "y"), class = "factor")), .Names = c("ID", "Name1", 
"Name2"), row.names = c("1", "2", "3", "4", "5", "6", "7", "9", 
"10"), class = "data.frame")

I get an error saying "group length is zero but data length is > 0" — brno792, Sep 26 '14 at 20:48

score 0 · Answer 4 · answered Sep 26 '14 at 21:10

0

Will there always be one (and only one) match between Name1 and Name2 per ID?

If so, you can use mutate from dplyr for this (using df as in @Richard Scriven's answer):

require(dplyr)

df[1:7,] %>% 
    group_by(ID) %>%
    mutate(Group = Name1[Name1 %in% Name2])

If there could be either one or no matches per ID, you could add in an ifelse statement to deal with the no matches situation.

df %>% 
    group_by(ID) %>%
    mutate(Group = ifelse(any(Name1 %in% Name2), 
                          as.character(Name1)[Name1 %in% Name2], "NA"))

answered Sep 26 '14 at 21:10

aosmith

34,856
9
84
118

I keep getting an error when I run mutate(): "incompatible size (3), expecting 4 (the group size)" for example. ID is a subset of some other columns to the left. But I am including those in my group_by() – brno792 Sep 26 '14 at 21:28
@brno792 Could you `dput` your dataset (or a portion of it) into your question? It sounds like there is some other structure that I didn't account for. – aosmith Sep 26 '14 at 21:36

Add column to Data Frame based on values of other columns

4 Answers4

Linked