22

An incredibly basic question in R yet the solution isn't clear.

How to split a vector of character into its individual characters, i.e. the opposite of paste(..., sep='') or stringr::str_c() ?

Anything less clunky than this:

sapply(1:26, function(i) { substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,i) } )
"A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

Can it be done otherwise, e.g. with strsplit(), stringr::* or anything else?

smci
  • 32,567
  • 20
  • 113
  • 146
  • My purpose was to generate the contents for an iterator: `it = iter(sapply(1:26, function(i) { substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",i,i) } ))` ... `nextElem(it)` – smci Apr 12 '14 at 10:05
  • @Henrik thanks a lot, but this was just an example for something more generic. – smci Apr 12 '14 at 11:05

4 Answers4

30

Yes, strsplit will do it. strsplit returns a list, so you can either use unlist to coerce the string to a single character vector, or use the list index [[1]] to access first element.

x <- paste(LETTERS, collapse = "")

unlist(strsplit(x, split = ""))
# [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
#[20] "T" "U" "V" "W" "X" "Y" "Z"

OR (noting that it is not actually necessary to name the split argument)

strsplit(x, "")[[1]]
# [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
#[20] "T" "U" "V" "W" "X" "Y" "Z"

You can also split on NULL or character(0) for the same result.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
5

str_extract_all() from stringr offers a nice way to perform this operation:

str_extract_all("ABCDEFGHIJKLMNOPQRSTUVWXYZ", boundary("character"))

[[1]]
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"
[22] "V" "W" "X" "Y" "Z"
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1

Since stringr 1.5.0, you can use str_split_1, a version of str_split for single strings:

library(stringr)
x <- paste(LETTERS, collapse = "")
str_split_1(x, "")
# [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
#[20] "T" "U" "V" "W" "X" "Y" "Z"
Maël
  • 45,206
  • 3
  • 29
  • 67
-1

This is rendered stepwise for clarity; in practice, a function would be created.

To find the number of times any character is repeated in sequence

the_string <- "BaaaaaaH"
# split string into characters
the_runs <- strsplit(the_string, "")[[1]]
# find runs
result <- rle(the_runs)
# find values that are repeated
result$values[which(result$lengths > 1)]
#> [1] "a"
# retest with more runs
the_string <- "BaabbccH"
# split string into characters
the_runs <- strsplit(the_string, "")[[1]]
# find runs
result <- rle(the_runs)
# find values that are repeated
result$values[which(result$lengths > 1)]
#> [1] "a" "b" "c"
Richard Careaga
  • 628
  • 1
  • 5
  • 11
  • 1
    No I didn't ask for run-length encoding, I simply said "split char vector into its individual characters". So `"BaabbccH"` should give 'B', 'a', 'a', 'b', 'b', 'c', 'c', 'H'. – smci Nov 03 '21 at 01:54
  • @smci Yeah, don't know what I got distracted by. – Richard Careaga Nov 04 '21 at 06:22