0

I have a dataframe where I want to omit cases where ages 30 or less are omitted. I know you can use na.omit to omit NA cases, but how would I omit specific cases like this?

David Lee
  • 1
  • 1
  • 2
    Please refer to this for a good question and easy to understand, so others can understand your question thorougly. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Sinh Nguyen Feb 08 '22 at 02:24
  • 1
    Your question is not very clear. Can you show some example input, and desired output? – neilfws Feb 08 '22 at 02:40

2 Answers2

0

Seems to be more a filtering problem than omitting missing values:

> df <- tibble(age = c(20,25,30,35,40))
> 
> df %>% filter(age < 30)
# A tibble: 2 × 1
    age
  <dbl>
1    20
2    25
> 
AugtPelle
  • 549
  • 1
  • 10
0

With base R, you can filter out all rows where the age is greater than 50.

df[df$age < 30,]

    age values
  <int>  <dbl>
1    21  1.89 
2    22  1.01 
3    23  0.107
4    24  1.46 
5    25  1.17 
6    26  1.86 
7    27  1.77 
8    28  1.91 
9    29  0.594

Or with data.table:

library(data.table)

dt <- data.table(df)
dt[age < 30]

However, if you are wanting to only filter NAs for the rows, where the age is greater than 30, then you can find the row index for age being greater than 30 and another column having NA. Then, you can exclude those rows.

df[!(df$age > 30 & is.na(df$values)),]

Or with subset:

subset(df, !(age > 30 & is.na(values)))

With tidyverse:

library(tidyverse)

df %>% 
  filter(!(age > 30 & is.na(values)))

data.table:

dt <- data.table(df)
dt[!(age > 30 & is.na(values))]

Data

df <- structure(list(age = 21:40, 
                     values = c(1.88648780807853, 1.01084147393703, 
                                0.107075828593224, 1.46145519195125, 1.16910230834037, 1.85718628577888, 
                                1.7749991081655, 1.91132036875933, 0.594451983459294, 0.976039483677596, 
                                1.31880497187376, 1.82749796425924, 1.98314357083291, 0.57053042575717, 
                                0.722490054555237, 1.66634088428691, 0.702816031407565, 0.622223159298301, 
                                0.298387756571174, 1.6071562608704)), 
                class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))
AndrewGB
  • 16,126
  • 5
  • 18
  • 49