I have a huge dataset of about 1.6 million rows, and the variable (column) I need to focus on is 'temperature'. The temperature column has many NA values, and the other variable columns have NA values throughout as well. I want to remove only the rows with NA values in the temperature column, I don't particularly care about the NA values in the other columns. How can I do this? If I end up needing to remove rows with NA values for more than just my temperature column, (eg the depth column) how can I select two columns? This is my code:
otn <- tidync(filename, row.names=TRUE) %>% activate('D0')
glider_table <- hyper_tibble(otn)
attach(glider_table)
summary(temperature)
na.omit(glider_table)
na.omit () removes all rows with NA values regardless of which column they're in, so I need something more selective.