38

I have the following string:

[1] "10012      ----      ----      ----      ----       CAB    UNCH                    CAB"

I want to split this string by the gaps, but the gaps have a variable number of spaces. Is there a way to use strsplit() function to split this string and return a vector of 8 elements that has removed all of the gaps?

One line of code is preferred.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Stu
  • 1,543
  • 3
  • 17
  • 31
  • 11
    `read.table(text = yourstring)`? – Henrik Jul 14 '14 at 16:51
  • @Henrik post as answer, please? I have used it million times. – zx8754 Nov 22 '19 at 12:58
  • @zx8754 Thanks for the heads-up. I'm not quite sure though; OP wants to "return a _vector of 8 elements_", whereas `read.table` would result in a `data.frame` with 8 columns. So it doesn't seem like the right tool here? – Henrik Nov 22 '19 at 15:54

3 Answers3

57

Just use strsplit with \\s+ to split on:

x <- "10012      ----      ----      ----      ----       CAB    UNCH       CAB"
x
# [1] "10012      ----      ----      ----      ----       CAB    UNCH       CAB"
strsplit(x, "\\s+")[[1]]
# [1] "10012" "----"  "----"  "----"  "----"  "CAB"   "UNCH"  "CAB"  
length(.Last.value)
# [1] 8

Or, in this case, scan also works:

scan(text = x, what = "")
# Read 8 items
# [1] "10012" "----"  "----"  "----"  "----"  "CAB"   "UNCH"  "CAB"  
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
18

strsplit function itself works, by simply using strsplit(ss, " +"):

ss = "10012      ----      ----      ----      ----       CAB    UNCH                    CAB"

strsplit(ss, " +")
[[1]]
[1] "10012" "----"  "----"  "----"  "----"  "CAB"   "UNCH"  "CAB"  

HTH

rnso
  • 23,686
  • 25
  • 112
  • 234
1

If you know the number of whitespaces in the input vector and the number of elements in the output vector, stringr::str_split_fixed() is another option.

I reproduced your example below.

test <- "10012      ----      ----      ----      ----       CAB    UNCH                    CAB"

stringr::str_split_fixed(test, " {2,}", 8) # at least two white spaces, eight elements 

This will be the output (the output is a matrix). If you want to turn this into a character vector, just pipe as.character().

     [,1]    [,2]   [,3]   [,4]   [,5]   [,6]  [,7]  
[1,] "10012" "----" "----" "----" "----" "CAB" "UNCH"
     [,8] 
[1,] "CAB"
jaeyeon
  • 359
  • 2
  • 4