0

I am using R and have a column in a dataframe where I would like to check for each row whether there is a bracket and if so whether the number in the bracket is greater than 0. This is so I can subset these rows and apply the appropriate information in another new column.

I am new to slack so please let me know if I need to clarify any details. Thanks in advance.

Edit (apologies it doesn't seem to let me submit in table and insists on it being formatted as code when I want it as a table): So if for example I had a column like:

|Column 1|
|--------|
|Q9H7C4 1xPhospho [S325(100)]|
|P11169 1xPhospho [S485(88.2)]|
|Q9UK59 1xPhospho [S/T]|
|Q8WW12 1xPhospho [S119(100)]

I want to subset the rows that if they have a bracket and that number is greater than 0 then I will paste the information into a new column. So the logic on the above condition would be TRUE, TRUE, FALSE, TRUE for the example column. Then the pasted information in the new column would be:

|New Column|
|----------|
|Q9H7C4 1xPhospho [S325(100)]|
|P11169 1xPhospho [S485(88.2)]|
|NA|
|Q8WW12 1xPhospho [S119(100)]

However, downstream of this I would like to fill in the NAs but think I can go from there once I work out this first step.

Rachel
  • 15
  • 3
  • 1
    Could you provide a sample of your data? See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. In R, you can use `dput(name_of_your_data)` and paste the result in your question, along with a sample of your expected result / what you have tried. – Donald Seinen Oct 29 '21 at 16:10
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Oct 29 '21 at 17:27
  • Hi thanks for the pointers, I tried putting an example data and explain desired output. Let me know if it needs more clarity. – Rachel Oct 29 '21 at 19:39

1 Answers1

0

This can be done by first str_extracting the number in parenthesis, if available, and then running a check with ifelse if that number is greater than 0:

library(stringr)
library(dplyr)
df %>%
  mutate(
    Num = str_extract(Col, "(?<=\\()\\d+(\\.\\d+)?(?=\\))"),
    New_col = ifelse(as.numeric(Num) > 0, Col, NA)) %>%
  select(-Num)
                              Col                         New_col
1  |Q9H7C4 1xPhospho [S325(100)]|  |Q9H7C4 1xPhospho [S325(100)]|
2 |P11169 1xPhospho [S485(88.2)]| |P11169 1xPhospho [S485(88.2)]|
3        |Q9UK59 1xPhospho [S/T]|                            <NA>
4   |Q8WW12 1xPhospho [S119(100)]   |Q8WW12 1xPhospho [S119(100)]

Data:

df <- data.frame(
  Col = c("|Q9H7C4 1xPhospho [S325(100)]|",
            "|P11169 1xPhospho [S485(88.2)]|",
            "|Q9UK59 1xPhospho [S/T]|",
            "|Q8WW12 1xPhospho [S119(100)]")
)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34