1

with this code i pretend to delete the rows in the a column which have the word "TRUE".

DATA2 <- DATA[!DATA$a == "TRUE”] 

However, I have "TRUE", "FALSE" and "NA". When I run this code R deletes the NA's as well. How can i skip this and only delete the rows with a "TRUE"?

I've already tried this one, but without success...

DATA2 <- DATA[!DATA$a=='TRUE',na.rm= FALSE]

Error:

Error in `[.data.frame`(DATA, !DATA$a == "TRUE",  : 
  unused argument (na.rm = FALSE)
Ana Raquel
  • 155
  • 3
  • 13

2 Answers2

3

I created some reproducible data:

df <- data.frame(
  col1 = c(1:15), 
  col2=rep(c("TRUE","FALSE", "NA"),5), 
  stringsAsFactors = FALSE)

Using base R, you can do this:

df2 <- df[df$col2 == "NA" | !df$col2 == "TRUE", ]

In dplyr:

library(dplyr)
df2 <- df %>% filter(col2 == "NA" | !col2 == "TRUE" )

Output:

> df2
   col1  col2
2     2 FALSE
3     3    NA
5     5 FALSE
6     6    NA
8     8 FALSE
9     9    NA
11   11 FALSE
12   12    NA
14   14 FALSE
15   15    NA

// Edit: changed NA values to Strings ("NA") as supplied in the question.

// Note:

If you want to convert "TRUE" to TRUE, "FALSE" to FALSE and "NA" to NA, you can do this:

df_bool <- data.frame(
  col1 = df$col1, 
  col2 = as.logical(df$col2)
)

Since df_bool$col2 will return real logical values instead of Strings looking like logical values, it can be used within if itself, without having to use == for TRUE and FALSE values:

df2 <- df_bool[!df_bool$col2 | is.na(df_bool$col2), ]
s-heins
  • 679
  • 1
  • 8
  • 20
  • I actually prefer your answer, it is more explicit about what it is doing. My suggestion takes a little more time to understand. – Paul Hiemstra Jan 19 '17 at 12:40
  • Thank you very much! I'm a beginner, code is completely new for me! It solved my problem :) – Ana Raquel Jan 19 '17 at 13:03
  • No problem @AnaRaquel, glad I could help! :) – s-heins Jan 19 '17 at 13:45
  • Apparently the error persists! When i try with the solution that @kaetschap wrote, it works, but not with my data... I tried with this but the TRUES, FALSES and NA's are still there... df2 <- df1[is.na(df1$outlier) | !df1$outlier == "TRUE", ] all columns are "characters' ... Why is it not working? Can you help me, pelase?I'm sorry to bothering you again with the same question.. – Ana Raquel Jan 20 '17 at 15:01
  • could it be that your column is `"NA"`, not `NA`? Then is.na(col2) would always be `FALSE`, since your `NA` values are not actually `NA`, but Strings. If that is the case, you can do this: `df2 <- df %>% filter(col2 == "NA") | !col2 == "TRUE" )`. I'll adjust the answer accordingly. – s-heins Jan 20 '17 at 15:29
0

First create some example data:

set.seed(1)
df = data.frame(x = runif(10), 
                y = runif(10), 
                z = sample(c('TRUE', 'FALSE', NA), 10, replace = TRUE),
       stringsAsFactors = FALSE) # Force to character, and not factor

The trick I use here is to replace the NA with "FALSE" inside the filter:

df[!ifelse(is.na(df$z), 'FALSE', df$z) == 'TRUE',]
            x         y     z
1  0.26550866 0.2059746  <NA>
3  0.57285336 0.6870228 FALSE
6  0.89838968 0.4976992 FALSE
8  0.66079779 0.9919061 FALSE
9  0.62911404 0.3800352  <NA>
10 0.06178627 0.7774452 FALSE

I really like the dplyr style of programming:

df %>% filter(ifelse(is.na(z), 'FALSE', z) != 'TRUE')
           x         y     z
1 0.26550866 0.2059746  <NA>
2 0.57285336 0.6870228 FALSE
3 0.89838968 0.4976992 FALSE
4 0.66079779 0.9919061 FALSE
5 0.62911404 0.3800352  <NA>
6 0.06178627 0.7774452 FALSE
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149