used R a lot in college but I don't use it anymore and I admit I am a little rusty. I know that for loops are not that great for larger datasets, and I know there's a much more efficient way to do what I'm about to do.
Please excuse me if I use terms incorrectly, and correct me if I'm wrong.
My data is a csv file with 3 columns. The first column is an ID field, and the second and third columns are different plans.
The ID field has a lot of duplicates, and I only want to use data where the ID fields are not duplicates. So if my data is 1, 3, 3, 5, 6, I only want to use 1, 5, and 6.
This is what I've done.
plan.get = function(plans){
counts = as.data.frame.table(table(plans[,1]))
unique.counts = counts[which(counts[,2] == "1"),]
unique.plans = matrix(0, length(unique.counts[,1]), 3)
for (i in 1:length(unique.plans[,1])){
unique.plans[i,1] = plans[which(plans[,1] == unique.counts[i,1]),1]
unique.plans[i,2] = as.character(plans[which(plans[,1] == unique.counts[i,1]),2])
unique.plans[i,3] = as.character(plans[which(plans[,1] == unique.counts[i,1]),3])
}
return(unique.plans)
}
I'm thinking of things like array functions or applys, but reading the documentation is just so far beyond me at this stage. Any help would be greatly appreciated!
Example of Data - 'plans'
1 Plan 1 Plan 2
3 Plan 2 Plan 2
3 Plan 2 Plan 2
5 Plan 3 Plan 1
6 Plan 2 Plan 3
7 Plan 1 Plan 2
7 Plan 3 Plan 1
8 Plan 2 Plan 3
And what I want my final output to be is (Because 3 and 7 are duplicates)
1 Plan 1 Plan 2
5 Plan 3 Plan 1
6 Plan 2 Plan 3
8 Plan 2 Plan 3
To be clear, the function I've written does exactly what I want it to do, but because of the for loop it is incredibly slow.