0

I'm having trouble subsetting my data to delete excluded participants. The variable excluded is coded such that yes = 1 (i.e., exclude), and no = 0 (i.e., include). Data was imported from REDCap using the REDCap script. Here's what I've tried, with the error being written above the syntax that isn't working:

#this resulted in 0 rows
df2<-subset(df, df1$excluded == 0)

#this produced the error message Error in `[.data.frame`(df1, df1$excluded ==  : undefined columns selected
df2 <-df1[df1$excluded == 0]

#This deleted all data in the dataframe
df2<-df1[!(df1$excluded == 1),]

#this resulted in 0 rows
df2<-subset(df1, df1$excluded !=1)

#this resulted in 0 rows
df2<-subset(df1, df1$excluded !="1")

#this resulted in the subset selecting excluded participants instead of included participants
df2<-subset(df1, df1$excluded.factor !="yes")

Here are the variable types:

class(df1$excluded) [1] "labelled" "integer"

class(participantdata.df$excluded.factor) [1] "factor"

Any idea why this is happening and how to fix it?

Thanks!

mcrane
  • 1
  • 2
    Welcome to SO, mcrane! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (perhaps via `dput(head(x))` or building data programmatically (e.g., `data.frame(...)`), possibly stochastically), perhaps actual output (with verbatim errors/warnings) versus intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Apr 25 '22 at 01:31
  • FYI, `subset` is a little different than many other functions: inside of `subset(df1, ...)`, don't use `df1$`. Some of your calls are mixing frame references (`df` with `df1`, etc), not sure if this is typos or if you really have two or more frames. – r2evans Apr 25 '22 at 01:36
  • Thank you this is really helpful! I think the problem ended up being that excluded was only coded as 1 if it was excluded, and missing if it was included. Here's what I ended up using: `df2<-subset(df1, is.na(df1$excluded))`. Also good to know what to include in future SO posts. The df vs. df1 was a typo here as I was trying to make my dataframe names more generic. Thanks for your help! – mcrane Apr 28 '22 at 20:07
  • `df2 <- subset(df1, is.na(excluded))`, again remove the `df1$` when the column you're referencing is the first argument to `subset`. – r2evans Apr 28 '22 at 20:15

0 Answers0