Return number from string

Question

I'm trying to extract the "Number" of "Humans" in the string below, for example:

string <- c("ProjectObjectives|Objectives_NA, PublishDate|PublishDate_NA, DeploymentID|DeploymentID_NA, Species|Human|Gender|Female, Species|Cat|Number|1, Species|Human|Number|1, Species|Human|Position|Left")

The position of the text in the string will constantly change, so I need R to search the string and find "Species|Human|Number|" and return 1.

Apologies if this is a duplicate of another thread, but I've looked here (extract a substring in R according to a pattern) and here (R extract part of string). But I'm not having any luck.

Any ideas?

score 2 · Accepted Answer · answered Dec 13 '16 at 08:22

Use a capturing approach - capture 1 or more digits (\d+) after the known substring (just escape the | symbols):

> string <- c("ProjectObjectives|Objectives_NA, PublishDate|PublishDate_NA, DeploymentID|DeploymentID_NA, Species|Human|Gender|Female, Species|Cat|Number|1, Species|Human|Number|1, Species|Human|Position|Left")
> pattern = "Species\\|Human\\|Number\\|(\\d+)"
> unlist(regmatches(string,regexec(pattern,string)))[2]
[1] "1"

A variation is to use a PCRE regex with regmatches/regexpr

> pattern="(?<=Species\\|Human\\|Number\\|)\\d+"
> regmatches(string,regexpr(pattern,string, perl=TRUE))
[1] "1"

Here, the left side context is put inside a non-consuming pattern, a positive lookbehind, (?<=...).

The same functionality can be achieved with \K operator:

> pattern="Species\\|Human\\|Number\\|\\K\\d+"
> regmatches(string,regexpr(pattern,string, perl=TRUE))
[1] "1"

score 1 · Answer 2 · answered Dec 13 '16 at 08:26

Simplest way I can think of:

as.integer(gsub("^.+Species\\|Human\\|Number\\|(\\d+).+$", "\\1", string))

It will introduce NAs where there is no mention of Speces|Human|Number. Also, there will be artefacts if any of the strings is a number (but I assume that this won't be an issue)

Return number from string

2 Answers2

Linked