0

I am interested in 1 specific column of a dataframe, where each row contains the name of a neighborhood and a specific number assigned to that neighborhood.

TOR - HOOD - Banbury-Don Mills (42) ( 23.6%)

Please see this image for a better understanding neighborhoodnum

I only want to extract the first bracketed numbers. TOR - HOOD - Alderwood (20) ( 25.4%)

I have tried using the stringr package but all the functions only take 1 string at a time. There are 140 rows in this column and I want the values from all the row. I am not sure how to go through every string in the column

Here is what I have tried and the results

and some code I used but got this error (Error in UseMethod("type") : no applicable method for 'type' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')")

hood_data<-tibble(hood=demo_edu_dataset$Geography)
head(hood_data)

hoodnum<-hood_data %>%
  #separate(hood, into= c("name", "number"), sep = "")
  stringr::str_extract_all(hood_data, "\\d")

Thank You

  • Maybe `stringr::str_extract(hood_data, "(?<=\\()\\d+(?=\\))")`? – Wiktor Stribiżew Feb 23 '20 at 14:16
  • 1
    Hi Faria Khandaker. Welcome to StackOverflow! Please do not post images of code or data here. Please read the info about [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! – dario Feb 23 '20 at 14:17

3 Answers3

1
hoodnum<-hood_data %>%
 separate(Geography, into= c("name", "number"), sep = "\\(")

This worked

0

Maybe you can try gsub like below, for example

df <- data.frame(X = c("TOR - HOOD - Alderwood (20) ( 25.4%)",
                       "TOR - HOOD - Annex (95) ( 27.9%)"))

df$Y <- as.numeric(gsub(".*?\\((\\w+)\\).*","\\1",df$X))

such that

> df
                                     X  Y
1 TOR - HOOD - Alderwood (20) ( 25.4%) 20
2     TOR - HOOD - Annex (95) ( 27.9%) 95
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

Or use str_extract from the stringr package as well as positive lookbehind and lookahead:

str_extract(YOURDATA, "(?<=\\()\\d{1,}(?=\\))")

This regex says: "when you see ( on the left and )on the right, match the number with at least 1 digit in the middle". If you wrap as.numeric around the whole expression, the numbers are converted from character to numeric:

as.numeric(str_extract(df$X, "(?<=\\()\\d{1,}(?=\\))"))
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34