0

I need to exclude subjects from a data frame to control for external influence in the data.

I have a dataset that has subjects classified by a DOC_ID such that these are columns within my dataset

df
...  DOC_ID      MESURE_1    FACTOR ...
     3232         -55223     alpha
     3232         -2321      beta
     6153         -201       alpha
     6153         -233       alpha
     2020          1717      beta
     2020          1771      gamma
     9999          39        alpha
     9999          93        alpha
     5353          1009      beta  
     5353          1091      alpha
      .             .
      .             .
      .             .

Now, lest say I've managed to pick out the factors I need such that :

df_temp <- subset(df, !FACTOR =="alpha")

df_temp_2 <- droplevels.data.frame(df_temp)

EXCL_df <- data.frame(summary(DSTF_ASS$Doc_ID))

subjects<- (row.names(EXCL_df))

subjects
[1] 3232  2020 5353
Levels: 3232  2020 5353

How do I exclude those DOC_ID's form the old data frame and create something like this:

df2
...  DOC_ID      MESURE_1    FACTOR ...
     6153         -201       alpha
     6153         -233       alpha
     9999          39        alpha
     9999          93        alpha
      .             .
      .             .
      .             .

I've been trying to subset, by using the subset function again, but to no avail.

  • No the problem is that I've got to exclude all measures for subjects that have ANY other measure in FACTOR than alpha, basically if FACTOR changes then I need to loose all measures for that subject – Ingi Freyr Atlason Sep 17 '19 at 16:43

1 Answers1

0

On the basis of the output, I think this is what you want:

EXCL_df <- c(3232,  2020, 5353)
new_df <- subset(df, !(DOC_ID %in% EXCL_df))
new_df
#  DOC_ID MEASURE_1 FACTOR
#3   6153      -201  alpha
#4   6153      -233  alpha
#7   9999        39  alpha
#8   9999        93  alpha

Data

#dput(df)
structure(list(DOC_ID = c(3232, 3232, 6153, 6153, 2020, 2020, 
9999, 9999, 5353, 5353), MEASURE_1 = c(-55223, -2321, -201, -233, 
1717, 1717, 39, 93, 1009, 1091), FACTOR = structure(c(1L, 2L, 
1L, 1L, 2L, 3L, 1L, 1L, 2L, 1L), .Label = c("alpha", "beta", 
"gamma"), class = "factor")), class = "data.frame", row.names = c(NA, 
-10L))
Community
  • 1
  • 1
deepseefan
  • 3,701
  • 3
  • 18
  • 31