0

So I have a large data set that I want to process in R. The equivalent excel function is below. I'm trying get the same result using the expss package match_row / index_row functions. Any help with this would be greatly appreciated. Basically I want the output value in df1 column D to be the value in column C in df2 where value A and value B in df1 match the value A and value B in the df2.

in excel I have this function:

=INDEX(df1!C:C, MATCH(1,INDEX((A2=df1!A:A)*(B2=df1!B:B),0,1),0))

see screenshot of my excel example below

which achieves what I want. however my dataset is to big for excel to handle and I want an executable script since the dataset will be updated periodically.

df1   (A)     (B)     (C)
#    Organism Drug    MIC
#          A  P       15
#          A  AM      11
#          B  P       9
#          B  AM      13
#          C  P       12
#          C  AM      10

df2   (A)     (B)     (C)     (d)- calculated values 
#    Organism Drug    Zone    MIC
#          C  P       13      [MIC for row C, P in df1 = 12]
#          A  P       11      [MIC for row A, P in df1 = 15]
#          C  AM      10
#          A  AM      15
#          B  P       16
#          C  AM      12

Excel df1 Excel df2

GonzoGoat
  • 1
  • 1
  • 2
    Welcome to SO, spock42! This is likely a `merge` or join operation, see https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/a/6188334/3358272. If you want more help, please make this question more *reproducible* by including sample data. This can be done programmatically (literal `data.frame(...)`), or pasting the output from `dput(x)`, where `x` is a small sample of sample data you have. For both, please be explicit about what your desired output is. (Also please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info.) Thanks! – r2evans Apr 12 '21 at 02:25
  • spock42, I wasn't kidding when I suggested either `data.frame(..)` (programmatic) or `dput(.)`. This sample data is not something I can just easily copy and use, whereas either of my suggestions would be as simple as highlight, copy, paste into my R console, and go at it. – r2evans Apr 12 '21 at 03:26

1 Answers1

0

This is a duplicate, but for your data:

merge(df1, df2, by = c("Organism", "Drug"), all = TRUE)
#   Organism Drug MIC Zone
# 1        A   AM  11   15
# 2        A    P  15   11
# 3        B   AM  13   NA
# 4        B    P   9   16
# 5        C   AM  10   10
# 6        C   AM  10   12
# 7        C    P  12   13

Data, in a reproducible format:

df1 <- structure(list(Organism = c("A", "A", "B", "B", "C", "C"), Drug = c("P", "AM", "P", "AM", "P", "AM"), MIC = c(15L, 11L, 9L, 13L, 12L, 10L)), class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(Organism = c("C", "A", "C", "A", "B", "C"), Drug = c("P", "P", "AM", "AM", "P", "AM"), Zone = c(13L, 11L, 10L, 15L, 16L, 12L)), class = "data.frame", row.names = c(NA, -6L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    r2evans - A simple merge totally works! That is so much easier than using some complicated calculated field type function. Thanks so much for your help and I see what you mean about reproducibility with regard to sample data. I will keep that in mind in the future as I continue my R learning journey. – GonzoGoat Apr 12 '21 at 14:08
  • I'm glad it helps. In data-sciency-things, to me the concept of "merge/join" is a really powerful thing that simplifies a lot of data management and data-processing. – r2evans Apr 12 '21 at 15:13