Extracting part of string by position in R

Question

I have a vector of strings string which look like this

ABC_EFG_HIG_ADF_AKF_MNB

Now from each of this element I want to extract the 3rd set of strings(from left) i.e in this case HIG. How can I achieve this in R

score 18 · Answer 1 · answered Mar 02 '16 at 17:27

18

substr extracts a substring by position:

substr('ABC_EFG_HIG_ADF_AKF_MNB', 9, 11)

returns

[1] "HIG"

answered Mar 02 '16 at 17:27

alistaire

42,459
4
77
117

RHertel · Answer 2 · 2016-03-02T17:45:28.250

Here's one more possibility:

strsplit(str1,"_")[[1]][3]
#[1] "HIG"

The command strsplit() does what its name suggests: it splits a string. The second parameter is the character on which the string is split, wherever it is found within the string.

Perhaps somewhat surprisingly, strsplit() returns a list. So we can either use unlist() to access the resulting split parts of the original string, or in this case address them with the index of the list [[1]] since the list in this example has only one member, which consists of six character strings (cf. the output of str(strsplit(str1,"_"))). To access the third entry of this list, we can specify [3] at the end of the command.

The string str1 is defined here as in the answer by @akrun.

Was about to post the same, but slightly different: `strsplit(str1,"_")[[c(1,3)]]`, just to show what a vector does inside `[[`. — nicola, Mar 02 '16 at 17:30

akrun · Accepted Answer · 2016-03-02T17:51:26.120

We can use sub. We match one or more characters that are not _ ([^_]+) followed by a _. Keep it in a capture group. As we wants to extract the third set of non _ characters, we repeat the previously enclosed group 2 times ({2}) followed by another capture group of one or more non _ characters, and the rest of the characters indicated by .*. In the replacement, we use the backreference for the second capture group (\\2).

sub("^([^_]+_){2}([^_]+).*", "\\2", str1)
#[1] "HIG"

Or another option is with scan

scan(text=str1, sep="_", what="", quiet=TRUE)[3]
#[1] "HIG"

A similar option as mentioned by @RHertel would be to use read.table/read.csv on the string

 read.table(text=str1,sep = "_", stringsAsFactors=FALSE)[,3]

data

str1 <- "ABC_EFG_HIG_ADF_AKF_MNB"

score 6 · Answer 4 · answered Jun 21 '18 at 08:11

6

If you know the place of the pattern you look for, and you know that it is fixed (here, between the 9 and 11 character), you can simply use str_sub(), from the stringr package.

MyString = 'ABC_EFG_HIG_ADF_AKF_MNB'
str_sub(MyString, 9, 11)

answered Jun 21 '18 at 08:11

Rtist

3,825
2
31
40

score 2 · Answer 5 · answered Sep 10 '22 at 15:44

A new option is using the function str_split_i from the development version stringr which can also extract a string by position split by a certain string. Here is a reproducible example:

# devtools::install_github("tidyverse/stringr")
library(stringr)
x <- c("ABC_EFG_HIG_ADF_AKF_MNB")
str_split_i(x, "_", 3)
#> [1] "HIG"

^{Created on 2022-09-10 with reprex v2.0.2}

As you can see it extracted the third string. If you want the 6th you can change the 3 with 6 like this:

library(stringr)
x <- c("ABC_EFG_HIG_ADF_AKF_MNB")
str_split_i(x, "_", 6)
#> [1] "MNB"

^{Created on 2022-09-10 with reprex v2.0.2}

Extracting part of string by position in R

5 Answers5

data

Linked