Is there a function in R that can delete thousands of rows given something specific?

Question

I have a dataframe (called TLSWL) of roughly 20,000 rows. I have a column for Time in the df and I need to delete all the rows that end in certain minutes. I need to have only the 30 minute increments of time to match other data for comparison.

This is what I'm currently using:

TLSWL<- TLSWL[TLSWL$Time != "0:06"&TLSWL$Time !="0:12"&TLSWL$Time 
    !="0:18"&TLSWL$Time !="0:24"&TLSWL$Time != "0:36"&TLSWL$Time 
    !="0:42"&TLSWL$Time !="0:48"&TLSWL$Time != "0:54"&TLSWL$Time 
    != "1:06"&TLSWL$Time != "1:12"&TLSWL$Time != "1:18"&TLSWL$Time 
    != "1:24"&TLSWL$Time != "1:36"&TLSWL$Time != "1:42"&TLSWL$Time 
    != "1:48"&TLSWL$Time != "1:54"&TLSWL$Time != "2:06"&TLSWL$Time 
    != "2:12"&TLSWL$Time != "2:18"&TLSWL$Time != "2:24"&TLSWL$Time 
    != "2:36"&TLSWL$Time != "2:42"&TLSWL$Time != "2:48"&TLSWL$Time 
    != "2:54"&TLSWL$Time != "3:06"&TLSWL$Time != "3:12"&TLSWL$Time  
           and so on all the way to the times in 24:xx

It works for dataframes that are smaller and not in military time, but as you can see the line of code required by doing it this way ends up being WAY too long.

Can this be done more efficiently?

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It would probably be much easier to keep times that end in "30" or "00" which would be a much shorter expression. — MrFlick, Aug 27 '19 at 15:42

score 5 · Answer 1 · answered Aug 27 '19 at 14:55

5

An option would be to create a vector ('v1') of values that needs to be matched, use %in% to get a logical vector, negate (!) and subset the rows

TLSWL[!TLSWL$Time %in% v1, , drop = FALSE]

answered Aug 27 '19 at 14:55

akrun

874,273
37
540
662

This was super helpful! Worked really well and definitely reduced the lines of code. I just need to figure out the best way to write the vector for time since I'm trying to avoid using any as.posix functions as it was giving me a lot of problems. – Katy Aug 27 '19 at 18:31
@Katy You can use `sprintf` to create the vector as in the other answer – akrun Aug 27 '19 at 20:01
1

I think the sprintf was to create random data. The best way to write the vector is to specify that you want :00 or :30, not negate 06, 12, etc. See the other answers. – Arthur Yip Aug 29 '19 at 00:29

score 1 · Answer 2 · edited Aug 27 '19 at 19:38

You could use the grepl function to select rows where the time includes the strings ":00" or ":30".

set.seed(4)
randTime <- function(n){
    h <- as.character(sprintf("%02d", sample(c(0:24), n, replace=T)))
    m <- as.character(sprintf("%02d", sample(c(0:59), n, replace=T)))
    paste0(h, ":", m)}

df <- data.frame(myTime = randTime(100))

df[grepl(":00|:30", df$myTime), ]

[1] 15:30 16:30 19:00

Arthur Yip · Answer 3 · 2019-08-27T18:26:37.947

1

I suggest using dplyr and tidyverse to make this task easier and clearer.

library(tidyverse)
TLSWL <- filter(TLSWL, str_detect(Time, ":00|:30"))

This does the same thing as the other solution that uses grepl. It's more wordy but it can be helpful for readability and understanding.

With pipes (so it makes sense left to right),

TLSWL <- TLSWL %>% filter(Time %>% str_detect(":00|:30"))

edited Aug 27 '19 at 18:26

answered Aug 27 '19 at 18:11

Arthur Yip

5,810
2
31
50

Is there a function in R that can delete thousands of rows given something specific?

3 Answers3