0

my data set has missing values marked as 'XXX'

I have tried na.omit(mydata)

df <- data.frame(X=factor(c(0.2, "XXX", 0.4, 0.1)), Y=factor(c(0.8, 1, 0.9, "XXX")))

here X and Y are factors. I found that the missing data is encoded as "XXX" by checking the levels of the factor.

I want to remove row "2" and row "4". can someone help, I have been trying for a while now.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • 1
    Please update your question with the desired behavior, specific problems, and code to reproduce it. See: [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) – abestrad Jun 11 '20 at 18:10
  • Does this answer your question? [how to filter data frame with conditions of two columns?](https://stackoverflow.com/questions/20084462/how-to-filter-data-frame-with-conditions-of-two-columns) – Matt Jun 11 '20 at 18:17
  • If any answers have solved your question, please mark the preferable one as "accepted" by clicking the check mark next to it. Thank you! – Darren Tsai Jun 12 '20 at 09:02

4 Answers4

2

You can also filter for complete cases like this:

library(dplyr)
library(magrittr)
df %>% replace(.=="XXX", NA_character_) %>% filter(complete.cases(.))

The output is:

> df %>% replace(.=="XXX", NA_character_) %>% filter(complete.cases(.))
    X   Y
1 0.2 0.8  
2 0.4 0.9
mysteRious
  • 4,102
  • 2
  • 16
  • 36
1

Two base R solutions:

df <- subset(df, X != "XXX" & Y != "XXX")

or

df <- df[df$X != "XXX" & df$Y != "XXX",]

dplyr solution:

library(dplyr)

df <- df %>% filter(X != "XXX" & Y != "XXX")

Gives us:

    X   Y
1 0.2 0.8
3 0.4 0.9
Matt
  • 7,255
  • 2
  • 12
  • 34
1

Another option using tidyverse:

df %>%
  mutate(across(everything(), str_replace, "XXX", NA_character_)) %>%
  drop_na()

#     X   Y
# 1 0.2 0.8
# 2 0.4 0.9
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
1

You don't need to convert "XXX" to NA. Just filter "XXX" directly.

library(dplyr)

df %>% filter(across(everything(), ~ . != "XXX"))

#     X   Y
# 1 0.2 0.8
# 2 0.4 0.9

The corresponding version using filter_all().

df %>% filter_all(all_vars(. != "XXX"))

A base R solution.

df[rowSums(df == "XXX") == 0, ]
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51