I have looked at other questions that seem to be having similar problems with random rows being replaced by all NA values, but have not found a solution because the other users had NAs already existing in their data frames (like Subsetting R data frame results in mysterious NA rows).
I used na.omit to remove any possible NAs first, but at the last step NA rows are still being produced.
I am subsetting data using three columns - an ID column which is a mix of numbers and letters (11xx1234), a binary categorical column (0 or 1), and a value column that is distance in meters. Any ID that appears more than once is assigned a 1 in the binary category column. I am trying to pull out data that is a 1 in the binary category column, but don't want the ID represented more than once so that the distance associated with each ID is not counted more than once and skews any statistical test.
So something like:
x<-data.frame(ObjectID = c("11AD1234", "11AD1234", "11AB123", "11BA34", "11DA354", "11DA354"),
component = c(1,1,0,0,1,1),
distance = c(2,2,5,8,4,4))
Which gives:
ObjectID component distance
1 11AD1234 1 2
2 11AD1234 1 2
3 11AB123 0 5
4 11BA34 0 8
5 11DA354 1 4
6 11DA354 1 4
Here is the code I am trying to use which works great until the distance column is added.
x[unique(x[x$component==1,]$ObjectID),]$distance
[1] 2 8
The correct answer should be 2 and 4, so what am I doing that is messing this up? And also somehow replacing a bunch of rows with NAs, (though it isn't represented in this example)? The real data is confidential and cannot be shared, sorry!