0

I have a dataframe that has some NA values. I want to pull out the rows that have NA, but I don't want to delete them, I want to make a new dataframe containing only those rows that had NA in a specific column.

I have searched for the answer on how to do this but everything I find just tells me how to remove the NA rows. There was a comment to one person where they said rather than removing the rows they don't want, search for how to make a new frame using the rows you do want....but I have not been able to find out how to do that.

My dataframe, Biov_per_genus2, looks like this:

      ID      Code Mag_x Sample Source Count Avg_Biov_cell total_biov_code
1    Env_102         A   200    102    Env    44  7.962052e+03    
2    Env_102         A   400    102    Env     1            NA              
3    Env_102        AA   200    102    Env     2  2.567925e+01    
4    Env_102        AA   400    102    Env     8  9.664901e+00    
5    Env_102         B   200    102    Env    46  1.883699e+04    
6    Env_102        CG   400    102    Env     1            NA              
7    Env_102        CY   400    102    Env    12  2.188643e+01    
8    Env_102         D   400    102    Env    21  1.413717e+01    
9    Env_102         F   400    102    Env     6  8.136725e+02    
10   Env_102    Group1   200    102    Env     2  2.073616e+02    
11   Env_102    Group1   400    102    Env    87  9.557676e+00    
12   Env_102        JJ   200    102    Env    24  5.169177e+03    
13   Env_102        JJ   400    102    Env    18  5.230752e+02    
14   Env_102        KK   400    102    Env     1            NA              
15   Env_102        MC   400    102    Env    32  1.342800e+03    
16   Env_102         N   400    102    Env     7  1.453212e+02    
17   Env_102         O   200    102    Env    43  2.035783e+04    
18   Env_102         O   400    102    Env    10  1.255538e+03    
19   Env_102     PrevH   200    102    Env     3  3.474356e+05    
20   Env_102      S-SS   200    102    Env     3  2.458556e+03    
21   Env_102      S-SS   400    102    Env     3  1.846000e+02    
22   Env_102        TF   200    102    Env     8            NA              
23   Env_102         U   200    102    Env     2  6.819019e+02    
24   Env_102        WG   200    102    Env     1  9.894446e+03    
25   Env_102         Z   200    102    Env    28  3.133701e+02    
26   Env_114         A   200    114    Env    34  8.463451e+03    
27   Env_114        AA   400    114    Env    23  1.027414e+01    
28   Env_114         B   200    114    Env     6  2.099966e+04    
29   Env_114        CC   200    114    Env     4            NA              
30   Env_114        CG   400    114    Env     1  1.000500e+03    
31   Env_114        CY   400    114    Env    24  3.989823e+01    
32   Env_114         D   400    114    Env    15  3.602360e+01    
33   Env_114         E   200    114    Env     4  7.127227e+03    
34   Env_114         F   400    114    Env    19  3.215944e+02    
35   Env_114         G   200    114    Env     4  3.106407e+03    
36   Env_114    Group1   200    114    Env    17  1.664819e+02    
37   Env_114    Group1   400    114    Env    91  1.020834e+01    
38   Env_114         J   400    114    Env     1  1.123198e+03    
39   Env_114        JJ   200    114    Env     6  1.630015e+03    
40   Env_114        JJ   400    114    Env     3  4.003960e+02    
41   Env_114        KK   200    114    Env     6            NA              
42   Env_114        KK   400    114    Env     3            NA              
43   Env_114    LL/N/O   400    114    Env     8  4.682544e+02    
44   Env_114        MC   400    114    Env    18  5.718000e+03    
45   Env_114         N   200    114    Env     1  8.586049e+03    
46   Env_114         O   200    114    Env    34  1.092983e+04    
47   Env_114      S-SS   200    114    Env     3  7.149000e+03    
48   Env_114        TF   200    114    Env    22  1.880243e+02    
49   Env_114        TF   400    114    Env     1            NA              
50   Env_114         U   200    114    Env     2  9.306367e+02    
51   Env_114        WG   200    114    Env     4            NA              
52   Env_114         Z   200    114    Env    58  2.270314e+02    
53   Env_125         A   200    125    Env   153  9.614530e+03    
54   Env_125         A   400    125    Env     6  2.200686e+02     

and it goes on for >700 rows. I want to pull out the rows which have NA in the Avg_Biov_cell column and put all that data into a new dataframe.

Any advice would be appreciated.

Robin
  • 21
  • 5
  • Please add more details to the question. For example, at least a dummy data frame to help us out in helping you! – prateek1592 Sep 26 '16 at 13:03
  • Have added what I think you are looking for as the answer. – prateek1592 Sep 26 '16 at 13:03
  • The dataframe is in the attached picture. I wasn't sure how to insert it as text and keep proper columns, so I took a screen shot. – Robin Sep 26 '16 at 13:13
  • 1
    Welcome to Stack Overflow! [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – zx8754 Sep 26 '16 at 13:23
  • Thanks for the welcome Zx8754. I was hesitant to join because I was told that users will rake you over the coals for asking questions that seem basic, but I guess everyone has to start somewhere. R seems so confusing and often the solutions I find online are equally perplexing or don't pertain exactly to what I'm trying to do. – Robin Sep 26 '16 at 13:37
  • Good start would be paste your data as text, or better yet use `dput(myData)`, and add expected output. Then it is much easier for us to test/provide solutions. SO has rules for good reasons, do not take it personally :) Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) – zx8754 Sep 26 '16 at 13:41
  • Thank you. I think I properly inserted it this time (the column spacing is off but other than that I think its readable) – Robin Sep 26 '16 at 13:59
  • I also tried subsetting using the code `NAsamples <- subset(Biov_per_genus2,Avg_Biov_cell = NA)` But this did not work, it returned the same dataframe as the original, with all the rows instead of just NA rows. – Robin Sep 26 '16 at 14:19
  • Finally got it!!! `NAsamples <-subset(Biov_per_genus2,is.na(Avg_Biov_cell))` – Robin Sep 26 '16 at 15:37

2 Answers2

1

Finally got it!!! NAsamples <-subset(Biov_per_genus2,is.na(Avg_Biov_cell))

Thanks to everyone who tried to help!

Robin
  • 21
  • 5
0

You are looking for something of this sort I presume.

set.seed(100)
a <- matrix(rnorm(100),10)
a[sample(1:100,5)] <- NA
a <- data.frame(a)

a[!complete.cases(a),]
prateek1592
  • 547
  • 5
  • 13