I would like to understand what does the following code actually do. My intention is to populate the Pop column in df2 with the data in "Pop" column of df1, matching the rows by column "ID".
df2$Pop <- df1[df1$ID == df2$ID,]$Pop
It seems straight forward if the rows are not ordered (it can just look for the one that matches IDs), but what if one data frame is bigger than the other (has more rows)? Does the order of comarison matter? I am not sure what to expect from the previous line of code. Does it work just like merge
(if in df1 I had only ID and Pop columns)? If so, why the two versions (advantages/disadvantages)?
df2 <- merge(df2, df1, by = "ID", all = FALSE, sort=FALSE)
By testing the two versions in data frames with different number of rows (df1 with 100.000 rows and df2 with 98.530), where df1 only has ID and Pop columns but df2 has over 4000 columns, the first version gives me a results instantaneously, while the merge
version takes about 8 seconds to run. I am new to R so I don't even know how to test the outputs and check if they are the same, but should they be?