I have two data frames df1 (4x4) and df2 (4x1). In each, first variable (i.e. Original_items and Reordered) is string. In df1, V2:V4 are numeric. You can see that in df1 and df2, data in the first variable is arranged in a different order. I need to do the following.
Take 1st element of the df2 'Reordered' variable (i.e. Enjoy holidays.), then search through elements of df1 'Original_items' variable to find the exact match.
When match is found, I need to take the entire row of data associated with the matched element in df1 'Original_items' (i.e."Enjoy holidays.", 4,1,3), and append it beside the same element of df2 'Reordered' variable (i.e. "Enjoy holidays"). I need this output in the new data frame, called df_desired, which should be: "Enjoy holidays.", "Enjoy holidays.", 4, ,1 ,3. Please see below illustration of this example.
When this is done, I would like to repeat this process for each element of the df2 'Reordered'variable, so the final result looks like df_desired table below.
Context of the problem. I have around 2,000 items and 1,000 data points associated with each item. As I need to match items and append data in a predefined way, I am trying to think of an efficient solution.
EDIT It was suggested that I could simply rename items in the "Original Variable". While this is true, it is inconvenient to do for a data frame of more than 2,000 items.
Also, it was mentioned that this question maybe only related to merging. I believe merging is needed here only for elements that have been identified as identical across df1 and df2. Therefore, there are two key questions: 1) how to match string variables in this particular case? 2) how to merge/append rows conditionally, i.e. if they have been matched. Thank you for your input and I would be grateful for your help please
I will mention what I tried and figured out so far. I realised
df1[,1] == df2 [,1] # gives me true or false if rows in column 1 are the
same in both data frames. I tried to set up a double loop, but unsuccessfully
for (i in 1:nrow(df1)) {
for (j in 1:nrow(df2)){
if (i==j){
c <- merge(a,b)
} else
print("no result")
}
}
I feel that in the loop I'm not able to specify that I am only working with row values from a single variable "Original_item" in df1
# df1 (4x4 matrix)
Original_items V2 V3 V4
Love birds. 1 5 3
Eat a lot of food. 2 5 5
Love birthdays. 2 2 4
Enjoy holidays. 4 1 3
# df2 (4x1 matrix)
Reordered
Enjoy holidays.
Eat a lot of food.
Love birds.
Love birthdays.
# df_desired (4x5 matrix)
Reordered Original_items V2 V3 V4
Enjoy holidays. Enjoy holidays. 4 1 3
Eat a lot of food. Eat a lot of food. 2 5 5
Love birds. Love birds. 1 5 3
Love birthdays. Love birthdays. 2 2 4