I am using a sampling method to a data about 700000 row and 5 columns for the full dataset. The columns are, x1 ; x2 ; x3 ; x4 ; y.
After I use the method in Matlab for the sampling method, i got around 400000 sampled dataset. But the trouble is, the method did not sample the "y", which means the "y" is in full dataset and separated from x1;x2;x3;x4 .
For weeks i tried to figure out the code again and still the "y" is not sampled. Therefore i have to find another way to match the dataset , The sampling method did not randomized the full data set, so it's particularly in order but the sampling method removes a few dataset. This is the screenshot of the data
So the above is the screenshot, we can see that from the "FULL DATASET", the Sampled only takes a few data. The highlighted blue color from "FULL DATASET" is the data taken into "Sampled", meanwhile the Black Text Color in "FULL DATASET" is removed, that's why there are no data from the Black Text Color in Full dataset in "Sampled". From the "Sampled" the Y is missing, i can fill in manually for this but it would take a very long time since the sampled data have around 400000 dataset. So how can i fill in the "Y" in "Sampled" from the "Full Dataset" that have been sampled using R dataframe?
Update
inputdata <- function(pop,sam)
{
dfpop <- data.frame(pop)
dfsam <- data.frame(sam)
ndfpop = nrow(dfpop)
ndfsam = nrow(dfsam)
for ( i in 1:ndfsam) {
if( dfsam[i,1] == dfpop[i,1] && dfsam[i,2] == dfpop[i,2] && dfsam[i,3] == dfpop[i,3] && dfsam[i,4] == dfpop[i,4] ) {
completesam<- print(dfpop[i,5] == dfsam[i,5])
}
}
write.csv(completesam, file = "D://completesampling.csv")
}
Previously i used Excel for this case, but since the work prefer R i used R instead. The function return FALSE all along the row , and i put multiple expression inside the IF
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
Do you have any idea which part of the code is missing?