0

I have the following vector:

a <- c("abc_lvl1", "def_lvl2")

I basically want to split into two vectors: ("abc", "def") and ("lvl1", "lvl2). I know how to substitute with sub:

sub(".*_", "", a)
[1] "lvl1" "lvl2"

I think this translates into "Search for any number of any characters before "_" and replace with nothing." Accordingly - i thought - this should give me the other desired vector:

sub("_*.", "", a), but it removes just the leading character:

[1] "bc_lvl1" "ef_lvl2"

Where do i mess up? This is essentially the equivalent for the "text-to-columns"-function in excel.

nouse
  • 3,315
  • 2
  • 29
  • 56
  • 1
    Just use `strsplit`? – A5C1D2H2I1M1N2O1R2T1 Mar 19 '16 at 17:00
  • that seems legit, but it creates a list of vectors, which i would need to split again: strsplit(a, "_") [[1]] [1] "abc" "lvl1" [[2]] [1] "def" "lvl2" – nouse Mar 19 '16 at 17:04
  • `*` means zero or more occurrences of the prior character and `.` means any character so `_*.` removes zero underscores followed by one character. You want `_.*` which will remove underscore followed by all further occurrences of any character. – G. Grothendieck Mar 19 '16 at 17:19

3 Answers3

5

There are several ways to do this. Here are a few, some using packages, and others with base R.

Given:

a <- c("abc_lvl1", "def_lvl2")

Here are some options:

do.call(rbind, strsplit(a, "_", TRUE))

matrix(scan(what = "", text = a, sep = "_"), ncol = 2, byrow = TRUE)

scan(text = a, sep = "_", what = list("", "")) ## a list

library(splitstackshape)
cSplit(data.table(a), "a", "_")

library(data.table)
setDT(tstrsplit(a, "_"))[]

library(dplyr)
library(tidyr)
data_frame(a) %>%
  separate(a, into = c("this", "that"))

library(reshape2)
colsplit(a, "_", c("this", "that"))

library(stringi)
t(stri_split_fixed(a, "_", simplify = TRUE))

library(iotools)
mstrsplit(a, "_")  # Matrix
dstrsplit(a, col_types = c("character", "character"), "_") # data.frame

library(gsubfn)
read.pattern(text = a, pattern = "(.*)_(.*)")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
3

We can use read.csv/read.table and specify the sep="_". It will split the strings into two columns.

read.csv(text=a, sep="_", header=FALSE)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I am really sorry for the excel-comment. This was just "free association". Those vectors are not related to excel-files. I am sorry. – nouse Mar 19 '16 at 17:02
  • @nouse I didn't really read your excel-files comment. This is a way to split up into columns – akrun Mar 19 '16 at 17:03
  • which is what i misunderstood. it works! :> – nouse Mar 19 '16 at 17:06
2

Just to build on the initial comments

a <- c("abc_lvl1", "def_lvl2")

a1 <- do.call(c, lapply(a, function(x){strsplit(x, "_")[[1]][1]}))
a2 <- do.call(c, lapply(a, function(x){strsplit(x, "_")[[1]][2]}))

a1
[1] "abc" "def"
a2
[1] "lvl1" "lvl2"