1

I have a huge dataset of about 1.6 million rows, and the variable (column) I need to focus on is 'temperature'. The temperature column has many NA values, and the other variable columns have NA values throughout as well. I want to remove only the rows with NA values in the temperature column, I don't particularly care about the NA values in the other columns. How can I do this? If I end up needing to remove rows with NA values for more than just my temperature column, (eg the depth column) how can I select two columns? This is my code:

otn <- tidync(filename, row.names=TRUE) %>% activate('D0')
glider_table <- hyper_tibble(otn)
attach(glider_table)
summary(temperature)
na.omit(glider_table)

na.omit () removes all rows with NA values regardless of which column they're in, so I need something more selective.

Emily
  • 59
  • 1
  • 4
  • 2
    `glider_table[!is.na(glider_table$your_col), ]` should do the trick. Also [read here](https://stackoverflow.com/questions/10067680/why-is-it-not-advisable-to-use-attach-in-r-and-what-should-i-use-instead) for why it is generally not advised to use `attach()` (just to make you aware). – Andrew Feb 12 '20 at 20:04
  • 1
    Does this answer your question? [Omit rows containing specific column of NA](https://stackoverflow.com/questions/11254524/omit-rows-containing-specific-column-of-na) – camille Feb 12 '20 at 20:26
  • hyper_tibble() does omit NA values by default, fwiw – mdsumner Jul 31 '20 at 21:04

1 Answers1

1

You can use the drop_na() function, the first argument is the dataset name, and the second is an optional argument where you can name the specific columns you want to remove the NA responses from. Like this , drop_na(dataset, column)