Remove row if any column contains a specific string

Question

I am trying to figure out the best approach in R to remove rows that contain a specific string, in my case 'no_data'.

I have data from an outside source that imputes na's with 'no_data'

an example is this:

 time  |speed  |wheels
1:00   |30     |no_data
2:00   |no_data|18
no_data|no_data|no_data
3:00   |50     |18

I want to go through the data and remove each row containing this 'no_data' string in any column. I have had a lot of trouble figuring this out. I have tried an sapply, filter, grep and combinations of the three. I am by no means an r expert so it could just be me incorrectly using these. Any help would be appreciated.

related https://stackoverflow.com/questions/22850026/filter-rows-which-contain-a-certain-string/49858451#49858451 — tjebo, Apr 01 '20 at 14:06
Does this answer your question? [Filter rows which contain a certain string](https://stackoverflow.com/questions/22850026/filter-rows-which-contain-a-certain-string) — tjebo, Apr 01 '20 at 14:07

akrun · Accepted Answer · 2017-06-14T12:34:54.110

11

We can use rowSums to create a logical vector and subset based on it

df1[rowSums(df1 == "no_data")==0, , drop = FALSE]
#   time speed wheels
#4 3:00    50     18

data

df1 <- structure(list(time = c("1:00", "2:00", "no_data", "3:00"), speed = c("30", 
"no_data", "no_data", "50"), wheels = c("no_data", "18", "no_data", 
"18")), .Names = c("time", "speed", "wheels"), class = "data.frame", 
row.names = c(NA, -4L))

edited Jun 14 '17 at 12:34

answered Jun 14 '17 at 12:18

akrun

874,273
37
540
662

I m getting some weird results when I do `df1 == 'no_data'`. Did you read the data frame from OP or did you create it yourself? – Sotos Jun 14 '17 at 12:34
1

@Sotos Updated the data I used – akrun Jun 14 '17 at 12:35
When I try this method, i get a "string is not in a standard unambiguous format" I assume this is because I have a mix of char,int,and num variables in the real data set. I failed to express that in my example... – lentz Jun 14 '17 at 12:39
1

@lentz Could you please check the `str(yourdata)` Here I have `character` class. If it is factor, convert to `character` – akrun Jun 14 '17 at 12:40
1

I believe the problem was due to a few `POSIXct` formats I had in the data. converted to `character` – lentz Jun 14 '17 at 12:45

tjebo · Answer 2 · 2022-11-23T16:25:24.837

edit update to the filter(if_all/if_any) syntax (dplyr vs. 1.0.10), formerly using across (now deprecated) and even before that filter_all or filter_any (superseded).

Here a dplyr option: (using Akrun's data)

library(dplyr)

df1 <- structure(list(time = c("1:00", "2:00", "no_data", "3:00"), speed = c("30", "no_data", "no_data", "50"), wheels = c("no_data", "18", "no_data", "18")), .Names = c("time", "speed", "wheels"), class = "data.frame", row.names = c(NA, -4L))

## with if_any
df1 %>% filter(if_any(everything(), ~ grepl("no_data", .)))
#>      time   speed  wheels
#> 1    1:00      30 no_data
#> 2    2:00 no_data      18
#> 3 no_data no_data no_data

## or with if_all
df1 %>% filter(if_all(everything(), ~ !grepl("no_data", .)))
#>   time speed wheels
#> 1 3:00    50     18

## to GET all rows that fulfil condition, use
df1 %>% filter(if_any(everything(), ~ grepl("no_data", .)))
#>      time   speed  wheels
#> 1    1:00      30 no_data
#> 2    2:00 no_data      18
#> 3 no_data no_data no_data

Sotos · Answer 3 · 2017-06-14T12:49:17.960

8

You can read the data using na.strings = 'no_data' to set them as NA and then simply omit NAs (or take complete.cases), i.e. (Using @akrun's data set)

d1 <- read.table(text = 'time   speed  wheels
 1    1:00      30 no_data
            2    2:00 no_data      18
            3 no_data no_data no_data
            4    3:00      50      18', na.strings = 'no_data', h=TRUE)

d1[complete.cases(d1),]
#  time speed wheels
#4 3:00    50     18

#OR

na.omit(d1)
#  time speed wheels
#4 3:00    50     18

edited Jun 14 '17 at 12:49

answered Jun 14 '17 at 12:42

Sotos

51,121
6
32
66

Thank you @akrun. I couldn't get it directly from OP data as some elements were 'no_data', others ' no_data', and others remain a mystery :) – Sotos Jun 14 '17 at 12:45

score 3 · Answer 4 · answered Jun 14 '17 at 12:26

akrun answer is quick, correct and simply as much is it can :) however if you like to make your life more complex you can also do:

dat
     time   speed  wheels
1    1:00      30 no_data
2    2:00 no_data      18
3 no_data no_data no_data
4    3:00      50      18

dat$new <- apply(dat[,1:3], 1, function(x) any(x %in% c("no_data")))
dat <- dat[!(dat$new==TRUE),]
dat$new <- NULL

dat
  time speed wheels
4 3:00    50     18

Remove row if any column contains a specific string

4 Answers4

data

Linked