2

I am using base R trying to split the following character string into 2:

some_text | number1 (number2) | number1 (number2) | number1 (number2)

vec1 <- number1 number1 number1 

vec2 <- number2 number2 number2

I have been able to remove the some_text and the | symbol but need help creating the vectors based on pattern matching.

rar
  • 894
  • 1
  • 9
  • 24
wisamb
  • 470
  • 3
  • 11
  • 1
    Is it a single string? Try `m1 <- lapply(strsplit(str1, "[ |()]"), function(x) matrix(x[x!=""][-1], ncol=2, byrow= TRUE))[[1]]; vec1 <- m1[,1]; vec2 <- m1[,2]` – akrun Jul 17 '18 at 05:16
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Ronak Shah Jul 17 '18 at 05:29

2 Answers2

3

Here is a base R option using strsplit, sub, and apply:

x <- "some_text | 1 (2) | 3 (4) | 5 (6)"
y <- strsplit(x, "\\s*\\|\\s*", perl=TRUE)
number1 <- sapply(y, function(x) { sub(" \\(\\d+\\)", "", x) })[2:4]
number2 <- sapply(y, function(x) { sub("\\d+ \\((\\d+)\\)", "\\1", x) })[2:4]

number1
number2

[1] "1" "3" "5"
[1] "2" "4" "6"

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

We can use base R with strsplit

m1 <- lapply(strsplit(str1, "[ |()]"), function(x)
                  matrix(x[x!=""][-1], ncol=2, byrow= TRUE))[[1]]
vec1 <- m1[,1]
vec2 <- m1[,2] 
vec1
#[1] "number1" "number1" "number1"

vec2
#[1] "number2" "number2" "number2"

data

str1 <- 'some_text | number1 (number2) | number1 (number2) | number1 (number2)'
akrun
  • 874,273
  • 37
  • 540
  • 662