In R, match data from a string variable across two data frames, and when match is found, merge corresponding rows

Question

I have two data frames df1 (4x4) and df2 (4x1). In each, first variable (i.e. Original_items and Reordered) is string. In df1, V2:V4 are numeric. You can see that in df1 and df2, data in the first variable is arranged in a different order. I need to do the following.

Take 1st element of the df2 'Reordered' variable (i.e. Enjoy holidays.), then search through elements of df1 'Original_items' variable to find the exact match.

When match is found, I need to take the entire row of data associated with the matched element in df1 'Original_items' (i.e."Enjoy holidays.", 4,1,3), and append it beside the same element of df2 'Reordered' variable (i.e. "Enjoy holidays"). I need this output in the new data frame, called df_desired, which should be: "Enjoy holidays.", "Enjoy holidays.", 4, ,1 ,3. Please see below illustration of this example.

When this is done, I would like to repeat this process for each element of the df2 'Reordered'variable, so the final result looks like df_desired table below.

Context of the problem. I have around 2,000 items and 1,000 data points associated with each item. As I need to match items and append data in a predefined way, I am trying to think of an efficient solution.

EDIT It was suggested that I could simply rename items in the "Original Variable". While this is true, it is inconvenient to do for a data frame of more than 2,000 items.

Also, it was mentioned that this question maybe only related to merging. I believe merging is needed here only for elements that have been identified as identical across df1 and df2. Therefore, there are two key questions: 1) how to match string variables in this particular case? 2) how to merge/append rows conditionally, i.e. if they have been matched. Thank you for your input and I would be grateful for your help please

I will mention what I tried and figured out so far. I realised

 df1[,1] == df2 [,1] # gives me true or false if rows in column 1 are the 
                       same in both data frames. I tried to set up a double loop, but unsuccessfully   
 for (i in 1:nrow(df1)) {                                                 
      for (j in 1:nrow(df2)){
    if (i==j){
         c <- merge(a,b)        
        } else 
              print("no result")
           }
                 }

I feel that in the loop I'm not able to specify that I am only working with row values from a single variable "Original_item" in df1

# df1 (4x4 matrix)
  Original_items        V2    V3   V4
  Love birds.            1     5    3
  Eat a lot of food.     2     5    5
  Love birthdays.        2     2    4
  Enjoy holidays.        4     1    3

# df2  (4x1 matrix)
  Reordered                   
  Enjoy holidays. 
  Eat a lot of food.
  Love birds. 
  Love birthdays.      

# df_desired (4x5 matrix)
  Reordered             Original_items            V2   V3   V4    
  Enjoy holidays.       Enjoy holidays.           4     1    3
  Eat a lot of food.    Eat a lot of food.        2     5    5 
  Love birds.           Love birds.               1     5    3
  Love birthdays.       Love birthdays.           2     2    4

Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) — NelsonGon, Apr 13 '19 at 13:48
OP, please tell us how your question is more than a simple `merge`. See link in @NelsonGon's comment. Loops are unnecessary. — Parfait, Apr 13 '19 at 15:45
Dear Parfait, if you can suggest a solution that avoids loops, and can reproduce my example, please let me know, I would be grateful to hear it. I have checked the post suggested by NelsonGon, and tried various merger commands but to no avail. I see that I can merge df1 and df2 to produce df_desired, but how do you rearrange the data to replicate df_desired? Please show me possible syntax, I would be very grateful for your help. — PsychometStats, Apr 13 '19 at 16:14

Dij · Accepted Answer · 2019-04-13T18:54:46.353

1

If i understand correctly, you first want to sort df1$original_items to be in the same order as df2 reorder, then apply that same sorting pattern to the rest of df1 variables. First get your vector of indices of df1 in the sequential order that you desire those rows of df1 to end up in.

#initialize an object to capture the above output
indices <- NULL
for (i in 1:nrow(df1)) {
    indices[i] <- which(df1$Original_items == df2$Reordered[i]))
}

Then, just use this list of indices to reorder the all the rows of df1 and create the new df.

df_desired <- cbind(df2$Reordered, df1[indices, ])

edited Apr 13 '19 at 18:54

answered Apr 13 '19 at 14:45

Dij

1,318
1
7
13

If you have multiple cases of each level in `df2` this gets a little tricker. In that case, you may be safe just wrapping the `which` statement in a call to `unique`, depending on your desired outcome. – Dij Apr 13 '19 at 14:47
Hello! Thank you for your reply! I found it helpful. There is some issue with the code as it does not reproduce the df_desired, as elements in the 'Original_items' and 'Reordered' variables" do not match. Could you please help me figure out the reason for that? Thank you – PsychometStats Apr 13 '19 at 15:23
1

Oh sorry, good catch! I edited the answer above so that it df_desired correctly reorders df1 in the order of df2, not the other way around. The new loop is as follows: `indices[i] <- which(df1$Original_items == df2$Reordered[i])` – Dij Apr 13 '19 at 18:55
1

thank you so much, everything works perfectly now!!! I am extremely grateful for your help! – PsychometStats Apr 13 '19 at 21:15
Cheers!! Glad to hear it. – Dij Apr 13 '19 at 22:36

In R, match data from a string variable across two data frames, and when match is found, merge corresponding rows

1 Answers1