I am aware this is a common question. However the common issue when combining dataframes in R is that there are overlapping column, and that the columns need functions.
This is not the case for my issue. I currently have 2 dataframes in R. They have a completely distinct set of columns except for an ID column common to both.
There is a primary and secondary dataset (this is important because not all of the values in the primary are present in the secondary)
I would like to add 2 columns from the secondary to the primary dataframe.
So the current method I use is theo iterate over all of the rows of the primary. Check if it is present in the secondary and if it is, add the rows by
primary$newCol1 <- Secondary$newCol1[Matching stuff here]
The main issue is that the datasets are massive, around 9 million rows each and a good deal of columns. Thus the process is extremely slow. I believe I have a correct solution. But the runtime is impossible in terms of days.
A quick example would be
rowC = nrow(dataframe1)
for(i in 1:rowC){
matchedRow = Dataframe1$id[i] == dataframe2$id
if dataframe1$id[i] %in% dataframe2$id{
dataframe1$newcol[i] <- dataframe2$newcol[matchedRow]
}
else{dataframe1$newcol[i]<- 'Unknown'}
Any help would be greatly appreciated