0

Character String is like this.

test <- c("John got a score of 4.5 in mathematics and scored 4.3 in English and ranked 4th.", "Matthew got a score of 7.6")

Output desired is c(8.8, 7.6).

Basically sum of numbers after "score" pattern.

I tried:

s <- as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\1"), test$Purpose)) + 
        as.numeric(gsub(pattern = "^\\D*score\\D*(\\d+\\.*\\d*)\\D*score*\\D*(\\d*\\.*\\d*)\\D*$", replacement = ("\\2"), test$Purpose))

However this is returning NAs.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
shivrajgondi
  • 21
  • 1
  • 4
  • Almost a duplicate of https://stackoverflow.com/questions/35947123/r-stringr-extract-number-after-specific-string . I got it with `str_extract_all(test, "(?i)(?<=score of\\D)\\d+.\\d+|(?i)(?<=scored\\D)\\d+.\\d+")` – Ronak Shah Feb 20 '18 at 04:47
  • `sum(as.numeric(strsplit(test, ' ')[[1]]), na.rm = TRUE)` – alistaire Feb 20 '18 at 05:22

1 Answers1

2

We can extract the numbers with regex and then do the sum

library(stringr)
sapply(str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+"),
                      function(x) sum(as.numeric(x)))
#[1] 8.8 7.6

Or using tidyverse

library(dplyr)
library(purrr)
str_extract_all(test, "\\b[0-9.]+\\b") %>%
      map_dbl(~ as.numeric(.x) %>%
                           sum)
#[1] 8.8 7.6

Or if we need to get only the numbers after score

str_extract_all(test, "(?<=score of )[0-9.]+|(?<=scored )[0-9.]+") %>%
     map_dbl(~ as.numeric(.x) %>%
                           sum)
akrun
  • 874,273
  • 37
  • 540
  • 662