I am able to filter my dataset using the strings in a particular column, here's a sample dataset and how I did it.
ID = c(1, 2, 3, 4)
String = c("Y N No", "Y", "Y No", "Y N")
df = data.frame(ID, String)
The problem is - I want to only pick the IDs that have N in them - or - IDs that don't have N in them.
df_2 <- dpylr::filter(df, !grepl('N', String))
Output: [2] [Y]
This will filter out the ID's with N, but it also removes ALL cases of N (including those that have 'No'. I'm new to R so I apologize if this is just me not understanding the syntax - but I cannot figure this out.
I could also try parsing out the string into individual columns, then selecting based on that - I need to do this anyway for later analysis. Below is the code that I use to achieve this.
df_2 <- df%>%mutate(String=gsub("\\b([A-Za-z]+)\\b","\\11",String),
name=str_extract_all(String,"[A-Za-z]+"),
value=str_extract_all(String,"\\d+"))%>%
unnest()%>%spread(name,value,fill=0)
This gives me
Output:
ID<chr> String<chr> N<chr> No <chr> Y<chr>
1 Y1 N1 No1 1 1 1
2 Y1 0 0 1
3 Y1 No1 0 1 1
4 Y1 N1 1 0 1
This way I could just select my rows based on whether or not N is zero or one - however, R doesn't like when I do this and I do not understand why.
Thank you for any help you could offer.
EDIT: Here is a sample of my actual data. I might have over simplified in my question.
m/z Column
241 C15 H22 O Na
265 C15 H15 N5
301 C16 H22 O4 Na
335 C19 H20 O4 Na
441 C26 H42 O4 Na
My goal is to filter out all of the N's in Column (They range from N, N1, N4, etc)