-3

I am new to R. I tried to pull data from a data frame A using subset Data frame A looks like this:

col a       col b
1            1     
1           NA
NA          NA
1           1

I want to find out the group with col a = 1 and col b <> 1 My code:

test <- subset(A, A$a == 1 & A$b == NULL)

OR

test <- subset(A, A$a == 1 & A$b <> 1)

test returns 0 rows.

sum(is.na(A$a))  

results: 5126

sum(is.na(A$b))

results: 6753

What is better ways to pull data using R?

Ninjia123
  • 47
  • 6
  • the not-equals operator is `!=`, and to test for `NA` you use `is.na(...)` (and therefore, for not-NA use `!is.na(...)`) – SymbolixAU Oct 02 '16 at 21:45
  • @SymbolixAU I also tried but still gave me 0 obs – Ninjia123 Oct 02 '16 at 21:47
  • so for your first example, `subset(A, A$a == 1 & is.na(A$b))` – SymbolixAU Oct 02 '16 at 21:47
  • @SymbolixAU Thanks! it works. any idea why my previous trials failed? – Ninjia123 Oct 02 '16 at 21:50
  • Firslty, `NULL` and `NA` are two different beasts (e.g. [here](http://stackoverflow.com/q/15496361/5977215) ), secondly, to test for `NULL` or `NA` use `is.na()` and `is.null()` respectively. For example, see the difference between these tests: `nullTest <- c(NULL);nullTest == NULL;is.null(nullTest)` – SymbolixAU Oct 02 '16 at 21:53
  • And when you have `NA`s in your data, you can't check for them by testing if they are not-equal to a number (so your `A$b != 1` won't find the NAs, you need to explicitly test for NA) – SymbolixAU Oct 02 '16 at 21:55

1 Answers1

1

To answer the "What is better ways to pull data using R" part of your question: you sould avoid using subset as it can cause problems and cannot be used to assign values. This has been discussed there:

Why is `[` better than `subset`?

and there:

http://www.cookbook-r.com/Basics/Getting_a_subset_of_a_data_structure/

In my opinion it is better to use [ or learn the data.table package.

Community
  • 1
  • 1
GuillaumeL
  • 985
  • 8
  • 11