I have a dataframe called reasons with columns where in some rows, there is text that have numbers in parenthesis. The format is like this.
concern notaware scenery
(2) chat community (4) more
(1) didn't know (1) beautiful (3) stunning
(3) often (1) always
Reproducible version:
structure(list(concern = c("(2) chat community (4) more", "(1) didn't know",
"(3) often"), notaware = c("", "(1) beautiful", ""), scenery = c("",
"(3) stunning", "(1) always")), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
I want a new data frame with just the parenthesis and numbers
concern notaware scenery
(2) (4)
(1) (1) (3)
(3) (1)
I realise there is a similar question here but the data is not in a column
Extracting data into new columns using R
and this but it doesn't seem to apply to a dataframe
Extract info inside all parenthesis in R
From the questions I've looked up I've tried to cobble a workaround. I tried
reasons %>% mutate(concern1 = str_match(concern, pattern = "\\(.*?\\)"))
Which resulted in an unchanged dataframe.
And this
reasons$concern1 <- sub(regmatches(reasons$concern, gregexpr(pat, reasons$concern, perl=TRUE)))
Which comes up with this
Error in sub(regmatches(UltraCodes$concern, gregexpr(pat,
UltraCodes$concern, :
argument "x" is missing, with no default
I looked at this which I know is a duplicate of the second question but it made more sense to me.
Using R to parse and return text in parenthesis
And I used
pat <- "(?<=\\()([^()]*)(?=\\))"
concern1 <- regmatches(reasons$concern, gregexpr(pat, reasons$concern,
perl=TRUE))
This gives me a list with a name and a type and a value - the values are what I want even though its '2' rather than (2)
So I figure I can make multiple lists and try to put them into a dataframe so I make a list notaware1 out of column notaware and so on. I have a feeling that the blank values are throwing things of as I try
reasons1 <-data.frame(concern1, notaware1)
reasons1 <-as.data.frame(concern1, notaware1)
Which gives me
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names =
TRUE, :
arguments imply differing number of rows: 0, 1, 2
Which I don't quite understand as all my lists are the same lengths, I feel I'm misunderstanding some fundamentals here.
Next I thought I could do a wrap around by exporting the list to csv, but the answers I've found seem to want me to turn the list into a dataframe first, which is my problem.
Then I find this
reasons$concern3 <-paste(concern1)
Which does add the list to my dataframe, and I can repeat this for all my lists.
However it is a bit messy as blanks are now given as character(0), one bracket is single numbers and where there are two brackets is c("2", "9") so my columns now look like this
concern adventure scenery
c("2", "9") character(0) character(0)
1 1 3
3 1 character(0)
But I have something that I can put into a csv file to tidy.
Is there a simpler way?