-1

I'm quite new with programming and I have a question. I have a file with multiple rows and i want to extract the rows that have a X and a Y for the same name. But my problem is that there are multipe X's and Y's for some names and i need to have at least 1 X and multiple Y or the other way around!

my data looks like this:

-   -   -   -   -   An.Pos  -   -   -   Name    -   - - - - 

1   678731  680107  2   8   X   1376    1   677193  685396  RP11-206L10.3   12  NA  NA  .    
1   1572876 1636342 2   4   X   63466   1   1590786 1594063 RP11-345P4.7    9   NA  NA  .    
1   1572876 1636342 2   4   Y   63466   1   1603429 1604850 RP11-345P4.7    9   NA  NA  .    
1   1572876 1636342 2   4   X   63466   1   1631369 1633249 MMP23A  9   NA  NA  .    

What I want to get is:

1   1572876 1636342 2   4   X   63466   1   1590786 1594063 RP11-345P4.7    9   NA  NA  .    
1   1572876 1636342 2   4   Y   63466   1   1603429 1604850 RP11-345P4.7    9   NA  NA  .

But in my real data it can be that RP11-345P4.7 has more than two rows. So wat i need is the names that have at least 1 X and 1 Y.

PS. I also dont know if it is easier to do it with R or Bash, or another language.

rolodu
  • 3
  • 2

1 Answers1

0

What you can do is, to first group by the Name column and then filter all rows that contain at least one X and one Y in the An.Pos Column.

If you use dplyr (and assuming that your dataframe is saved in df), it would look like:

dplyr::group_by(df, Name) %>%
 dplyr::filter(2==n_distinct(An.Pos)) %>%
 dplyr::ungroup()
Frank
  • 98
  • 5