1

I have a specific Excel file that comes from a parser

 > head(my_data)
# A tibble: 6 x 97
  `Investigator N~ `Admin User` `HTG Run ID` `7e4db22f-7025-~ ...5  `Parser Version`
  <chr>            <chr>        <chr>        <chr>            <chr> <chr>           
1 Experiment Name  MeganOBP1    NA           NA               NA    Date Parsed     
2 Experiment Date  NA           NA           NA               NA    NA              
3 Description      Sequencer: ~ NA           NA               NA    NA              
4 NA               NA           NA           NA               NA    NA              
5 Probe Set        ESOBP        NA           NA               NA    NA              
6 Sample ID        1            2            3                4     5               
# ... with 91 more variables: `5.2000000000000002` <chr>, ...8 <chr>, ...9 <chr>,
#   ...10 <chr>, ...11 <chr>, ...12 <chr>, ...13 <chr>, ...14 <chr>, ...15 <chr>,
#   ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>, ...20 <chr>, ...21 <chr>,
#   ...22 <chr>, ...23 <chr>, ...24 <chr>, ...25 <chr>, ...26 <chr>, ...27 <chr>,
#   ...28 <chr>, ...29 <chr>, ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>,
#   ...34 <chr>, ...35 <chr>, ...36 <chr>, ...37 <chr>, ...38 <chr>, ...39 <chr>,
#   ...40 <chr>, ...41 <chr>, ...42 <chr>, ...43 <chr>, ...44 <chr>, ...45 <chr>,
#   ...46 <chr>, ...47 <chr>, ...48 <chr>, ...49 <chr>, ...50 <chr>, ...51 <chr>,
#   ...52 <chr>, ...53 <chr>, ...54 <chr>, ...55 <chr>, ...56 <chr>, ...57 <chr>,
#   ...58 <chr>, ...59 <chr>, ...60 <chr>, ...61 <chr>, ...62 <chr>, ...63 <chr>,
#   ...64 <chr>, ...65 <chr>, ...66 <chr>, ...67 <chr>, ...68 <chr>, ...69 <chr>,
#   ...70 <chr>, ...71 <chr>, ...72 <chr>, ...73 <chr>, ...74 <chr>, ...75 <chr>,
#   ...76 <chr>, ...77 <chr>, ...78 <chr>, ...79 <chr>, ...80 <chr>, ...81 <chr>,
#   ...82 <chr>, ...83 <chr>, ...84 <chr>, ...85 <chr>, ...86 <chr>, ...87 <chr>,
#   ...88 <chr>, ...89 <chr>, ...90 <chr>, ...91 <chr>, ...92 <chr>, ...93 <chr>,
#   ...94 <chr>, ...95 <chr>, ...96 <chr>, ...97 <chr>
> 

Like

[![enter image description here][1]][1]

From this data I only want to extract specific IDs from WELL column defined as below

> my_specified_IDs
 [1] "A1"  "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"  "A9"  "A10" "A11" "A12" "B1" 
[14] "B2"  "B3"  "B4"  "B5"  "B6"  "B7"  "B8"  "B9"  "B10" "B11" "B12" "C1"  "C2" 
[27] "C3"  "C4"  "C5"  "C6"  "C7"  "C8"  "C9"  "C10" "C11" "C12" "D1"  "E1"  "E2" 
[40] "E3"  "E4"  "E5"  "E6"  "E7"  "E8"  "E9"  "E10" "E11" "E12" "F1"  "F2"  "F5" 
[53] "F6"  "F7"  "F8"  "F11" "F12" "G1"  "G2"  "G3"  "G4"  "G5"  "G6"  "G7"  "G8" 
[66] "G9"  "G10" "G11" "G12" "H1"  "H2"  "H3"  "H4"  "H5"  "H6"  "H7"  "H8" 
>

My desired output is the same format of Excel file but only having these IDs in WELL column

I thought about intersect but this is not working

Mussa
  • 117
  • 7
  • 1
    May be, when you read the data, you can use `skip = 8` – akrun Dec 25 '20 at 21:31
  • 1
    ok, if it is a csv, file, when you read the data, with `read.csv` use `skip = 8` i.e. the row from which you want to read the data – akrun Dec 25 '20 at 21:32
  • No, I need 77 columns from WELL columns and I defined thst in my post – Mussa Dec 25 '20 at 21:32
  • 1
    If you have already read the data, then use `my_data %>% filter(`Investigator Name` == "Well")` – akrun Dec 25 '20 at 21:34
  • 1
    Can you test that `filter` code. From your expected, I think you want that row where the 'Investigator Name' is "Well" – akrun Dec 25 '20 at 21:36
  • In WELL I have 96 IDs from which I only need 77 IDs. These IDs my_IDs [1] "A1" "A2" "A3" "A4" "A5" "A6" "A7" "A8" "A9" "A10" "A11" "A12" "B1" [14] "B2" "B3" "B4" "B5" "B6" "B7" "B8" "B9" "B10" "B11" "B12" "C1" "C2" [27] "C3" "C4" "C5" "C6" "C7" "C8" "C9" "C10" "C11" "C12" "D1" "E1" "E2" [40] "E3" "E4" "E5" "E6" "E7" "E8" "E9" "E10" "E11" "E12" "F1" "F2" "F5" [53] "F6" "F7" "F8" "F11" "F12" "G1" "G2" "G3" "G4" "G5" "G6" "G7" "G8" [66] "G9" "G10" "G11" "G12" "H1" "H2" "H3" "H4" "H5" "H6" "H7" "H8" > – Mussa Dec 25 '20 at 21:36
  • 1
    Can you specify which 77 you wanted to subset – akrun Dec 25 '20 at 21:36
  • 1
    The data you showed is `my_data` or is it some other data. I don't see any `WELL` column in the one you showed – akrun Dec 25 '20 at 21:38
  • Yes but in screen shot I have attached you may see the WELL column – Mussa Dec 25 '20 at 21:39
  • 1
    Can you test the code below. Based on the description, I guess you want to select the columns that have any of those values from `my_specified_IDs` – akrun Dec 25 '20 at 21:48
  • 1
    [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with and which is helpful for other users. That includes workable data (i.e. not a picture of it) that recreates the problem you're trying to solve – camille Dec 25 '20 at 21:53

2 Answers2

1

Based on the image showed, the there is a row in the first column which have value 'Well'. Thus, we subset by filtering that row, reshape to 'long' format and filter based on the 'my_specified_IDs' and use that to select the columns of the 'my_data'

library(dplyr)
library(tidyr)
my_data %>%
      filter(`Investigator Name` == "Well") %>%
      pivot_longer(everything()) %>%
      filter(name %in% my_specified_IDs) %>%
      pull(name) %>%
      select(my_data, .)

Or another option is to use select_if

my_data %>%
    select_if(~ any(. %in% my_specified_IDs))

A small reproudicible example with iris

head(iris) %>%
    select_if(~ any(. %in% "setosa"))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

In base R, you can use sapply to select columns that have any of the specified ID's.

my_data[colSums(sapply(my_data, `%in%`, my_specified_IDs)) > 0]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213