I scraped some data from a website but it was really janky and for some reason had little mistakes in it. So, I scraped the same data 3 times, and produced 3 tables that look like:
library(data.table)
df1 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
id = c(1, 2, 3, 4),
thing=c(2, 1, 3, 4),
otherthing = c(2,1, 3, 4)
)
df2 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
id = c(1, 2, 3, 4),
thing=c(1, 1, 1, 4),
otherthing = c(2,2, 3, 4)
)
df3 <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
id = c(1, 2, 3, 4),
thing=c(1, 1, 3, 4),
otherthing = c(2,1, 3, 3)
)
Except I have many more columns. I want to combine the 3 tables together, and when the values for "thing" and "other thing" etc. conflict, I want it to pick the value that has at least 2/3 and perhaps return an N/A if there is no 2/3 value. I'm confident the "name" and "id" field are good and they're what I want to sort of merge on.
I was considering setting the names for the tables to be, "thing1" "thing2" and "thing3" in the 3 tables respectively, merging together, and then writing some loops through the names. Is there a more elegant solution? It needs to work for 300+ value columns although I'm not super worried about speed.
In this example, the solution I think should be:
final_result <- data.table(name = c('adam', 'bob', 'carl', 'dan'),
id = c(1, 2, 3, 4),
thing=c(1, 1, 3, 4),
otherthing = c(2,1, 3, 4)
)