R: Extracting numerical values from strings in a column

Question

I am interested in 1 specific column of a dataframe, where each row contains the name of a neighborhood and a specific number assigned to that neighborhood.

TOR - HOOD - Banbury-Don Mills (42) ( 23.6%)

Please see this image for a better understanding neighborhoodnum

I only want to extract the first bracketed numbers. TOR - HOOD - Alderwood (20) ( 25.4%)

I have tried using the stringr package but all the functions only take 1 string at a time. There are 140 rows in this column and I want the values from all the row. I am not sure how to go through every string in the column

Here is what I have tried and the results

and some code I used but got this error (Error in UseMethod("type") : no applicable method for 'type' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')")

hood_data<-tibble(hood=demo_edu_dataset$Geography)
head(hood_data)

hoodnum<-hood_data %>%
  #separate(hood, into= c("name", "number"), sep = "")
  stringr::str_extract_all(hood_data, "\\d")

Thank You

Maybe `stringr::str_extract(hood_data, "(?<=\\()\\d+(?=\\))")`? — Wiktor Stribiżew, Feb 23 '20 at 14:16
Hi Faria Khandaker. Welcome to StackOverflow! Please do not post images of code or data here. Please read the info about [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! — dario, Feb 23 '20 at 14:17

score 1 · Accepted Answer · answered Feb 23 '20 at 15:34

1

hoodnum<-hood_data %>%
 separate(Geography, into= c("name", "number"), sep = "\\(")

This worked

answered Feb 23 '20 at 15:34

Faria Khandaker

21
4

score 0 · Answer 2 · answered Feb 23 '20 at 14:31

Maybe you can try gsub like below, for example

df <- data.frame(X = c("TOR - HOOD - Alderwood (20) ( 25.4%)",
                       "TOR - HOOD - Annex (95) ( 27.9%)"))

df$Y <- as.numeric(gsub(".*?\\((\\w+)\\).*","\\1",df$X))

such that

> df
                                     X  Y
1 TOR - HOOD - Alderwood (20) ( 25.4%) 20
2     TOR - HOOD - Annex (95) ( 27.9%) 95

Chris Ruehlemann · Answer 3 · 2020-02-23T17:07:40.553

Or use str_extract from the stringr package as well as positive lookbehind and lookahead:

str_extract(YOURDATA, "(?<=\\()\\d{1,}(?=\\))")

This regex says: "when you see ( on the left and )on the right, match the number with at least 1 digit in the middle". If you wrap as.numeric around the whole expression, the numbers are converted from character to numeric:

as.numeric(str_extract(df$X, "(?<=\\()\\d{1,}(?=\\))"))

R: Extracting numerical values from strings in a column

3 Answers3