0

Let's assume two dataframes: A and B containing data like the following one: Dataframe: A Dataframe: B

 ColA1   ColA2             ColB1      ColB2
  | Dog   | Lion              | Lion     | Lion
  | Lion  | Dog               | Cat      | NA
  | Zebra | Tiger             | Tiger    | Tiger
  | Bat   | Parrot            | Dog      | Dog

If an animal of ColB1 exists either in ColA1 or ColA2, then insert into ColB2 the name of this animal from 'ColB2', else insert NA.

Instead of running twice the ifelse function twice:

B$ColB2<- ifelse((B$ColB1 %in% A$ColA1 | B$ColB1 %in% AColA2), "animal from ColA1" , NA)

How could this become shorter? By applying an apply function, can it become faster?

Dino C
  • 307
  • 3
  • 15
  • 4
    [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Sotos Sep 04 '17 at 12:26
  • 3
    Maybe `B$ColB2 <- ifelse(B$ColB1 %in% unique(c(A$ColA1, A$ColA2)), B$ColB1 , NA)` – zx8754 Sep 04 '17 at 12:30
  • The use of indexes is also option. See my answer for a example – h3rm4n Sep 04 '17 at 15:40
  • Apply wouldn't really be faster in this case because you're not working with many columns. It's basically just two vectors you're comparing. For this you can just melt data frame A into a vector, then compare it to ColB1 as @zx8754 shows. See my answer below for more. – www Sep 04 '17 at 18:03

3 Answers3

2

The use of indexes is also option:

i <- dfB$ColB1 %in% unlist(dfA)
dfB$ColB2[i] <- as.character(dfB$ColB2[i])

The result:

> dfB
  ColB1 ColB2
1  Lion  Lion
2   Cat    NA
3 Tiger Tiger
4   Dog   Dog
h3rm4n
  • 4,126
  • 15
  • 21
1

you can try with dplyr:

library(dplyr)

dfB %>%
  mutate(colB3 = if_else(ColB1 %in% unlist(dfA), ColB1, NULL))

which gives:

   ColB1  ColB2  colB3
1   Lion   Lion   Lion
2    Cat     NA     NA
3  Tiger  Tiger  Tiger
4    Dog    Dog    Dog

inputs :

 dput(dfA)
structure(list(ColA1 = structure(c(2L, 3L, 4L, 1L), .Label = c("Bat", 
"Dog", "Lion", "Zebra"), class = "factor"), ColA2 = structure(c(2L, 
1L, 4L, 3L), .Label = c("Dog", "Lion", "Parrot", "Tiger"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L), .Names = c("ColA1", "ColA2"))

dput(dfB)
structure(list(ColB1 = structure(c(3L, 1L, 4L, 2L), .Label = c("Cat", 
"Dog", "Lion", "Tiger"), class = "factor"), ColB2 = structure(c(2L, 
3L, 4L, 1L), .Label = c("Dog", "Lion", "NA", "Tiger"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L), .Names = c("ColB1", "ColB2"))
Sotos
  • 51,121
  • 6
  • 32
  • 66
Aramis7d
  • 2,444
  • 19
  • 25
  • 2
    Also you need to mention any libraries you use (I have no idea why you would want to load a package here to do `ifelse`) but still you need to mention them – Sotos Sep 04 '17 at 12:38
1

This might be the simplest:

df_B$ColB2 <- ifelse(df_B$ColB1 %in% unlist(df_A[,c(1:2)]), df_B$ColB1, NA)

Output:

  ColB1 ColB2
1  Lion  Lion
2   Cat  <NA>
3 Tiger Tiger
4   Dog   Dog

To find the individual index in each column of df_A that matches the value in df_B$ColB1, you can use something like:

x<-apply(df_A[,c(1:2)],2,function(x) sapply(df_B$ColB1, function(i) grep(i,x)))

Output of str(x):

List of 2
$ ColA1:List of 4
 ..$ Lion : int 2
 ..$ Cat  : int(0) 
 ..$ Tiger: int(0) 
 ..$ Dog  : int 1
$ ColA2:List of 4
 ..$ Lion : int 1
 ..$ Cat  : int(0) 
 ..$ Tiger: int 3
 ..$ Dog  : int 2
www
  • 4,124
  • 1
  • 11
  • 22
  • thanks, this looks more compact for me. If I wanted to give the value of the A columns in case of TRUE, instead of giving the value df_B$ColB1, how can I get this? More precicly keeping your code, could we get the index of an A column where the matching occurs instead of "df_B$ColB1"? – Dino C Sep 06 '17 at 08:05
  • @DinoC - No problem, I can help you with that. What would you like the result to look like? Are you wanting the individual index in each column of df_A that matches the value in df_B$ColB1? I've just included an example of that output above, which I can modify the output of if that's not quite what you're looking for. – www Sep 06 '17 at 17:33
  • thanks once again. I edited my initial post. Any ideda on this? – Dino C Sep 07 '17 at 12:35
  • 1
    @DinoC - Instead of editing the question, please start a new question. This way the answers still make sense in the context of the original question. Once a new question is started, yes, I'd be happy to help. After you start a new question, just use "@RyanRunge" in a comment to call me to a question. – www Sep 07 '17 at 23:22