-6

We have a data frame from a tab delimited file. The data frame NCNT has columns 2 and 3 with observed values as A,G,T,C and missing data represented as '.' instead of NA.

We would like to use the subset command to define a new data frame newNCNT such that it only contains rows that have the missing value '.' value from columns 2 and 3.

Ann
  • 5
  • 5
  • 5
    Welcome to Stack Overflow! Your question does not contain a [reproducible example](http://stackoverflow.com/q/5963269/4303162). It is therefore hard to understand your problem and give you an appropriate answer. Please make your data available (e.g. by using `dput()`) or use one of the example data sets in R. Also, add the minimal code required to reproduce your problem to your post. – Stibu Feb 27 '16 at 14:37
  • Without a reproducible example, all we can do is guess - for example it might be `subset(NCNT, rowSums(NCNT[2:3] == ".") > 0)` – talat Feb 27 '16 at 15:04
  • Thank you so much @docendo discimus. The code worked. – Ann Feb 28 '16 at 13:02
  • @Stibu, sorry, I will post the minimal code and reproducible example the next time. – Ann Feb 28 '16 at 13:03

1 Answers1

1

This should deliver the desired subset using ordinary logical indexing and logical operators:

newNCNT <- NCNT[ NCNT[[2]] == "." & NCNT[[3]] == ".", ]

In order to use the subset function one would ordinarily need to know the column names for those two columns. If one knew the names to be name1 and name2 then it might be:

newNCNT <- subset( NCNT, name1 == "." & name2 == ".")

This will deliver rows where both values in those columns are ".". Many people have difficulty expressing their desired logical operations correctly, so if you wanted rows with either column 2 or column 3 having a missing value then you would need the | (OR) operator. @docendodiscimus apparently thought you wanted the latter.

IRTFM
  • 258,963
  • 21
  • 364
  • 487