Unpack string and get vector of strings

Question

I'll explain what my issue is here. Here's what I have.

Column of Individual strings

df <- as.data.frame(c("NI",
        "FA",
        "FI",
        "FST",
        "FA,NI",
        "IA,FI,IO",
        "NI,DI",
        "IA,NI,IO",
        "IA,FT,FI",
        "FA,FT,FI",
        "IA,FST,FI"))

names(df) <- "Column_of_strings"

NOTE: output may not show with quotes, but they are strings. I decided to include them in for the heck of it.

df                   
                         Column_of_strings

                         "NI"
                         "FA"
                         "FI"
                         "FST"
                         "FA,NI"
                         "IA,FI,IO"
                         "NI,DI"
                         "IA,NI,IO"
                         "IA,FT,FI"
                         "FA,FT,FI"
                         "IA,FST,FI"

What I would like:

                         Column_of_strings

                         "NI"
                         "FA"
                         "FI"
                         "FST"
                         "FA","NI"
                         "IA","FI","IO"
                         "NI","DI"
                         "IA","NI","IO"
                         "IA","FT","FI"
                         "FA","FT","FI"
                         "IA","FST","FI"

or even better would be if it these groups of strings could be stored as vectors themselves.

                         Column_of_strings

                         c("NI")
                         c("FA")
                         c("FI")
                         c("FST")
                         c("FA","NI")
                         c("IA","FI","IO")
                         c("NI","DI")
                         c("IA","NI","IO")
                         c("IA","FT","FI")
                         c("FA","FT","FI")
                         c("IA","FST","FI")

In summary:

Does anyone have an idea of how to:

Initially unpack the list of strings I have
Assign subgroups of strings within each string, commas separate each desired string.
Assign subgroups to a vector ideally

All advice appreciated!

score 2 · Accepted Answer · answered Mar 16 '16 at 07:31

2

We can split using strsplit to get a list of vectors.

lst <- strsplit(as.character(df[,1]), ',')

If we need to do some operation in the list, we can use lapply/sapply/vapply etc to loop through the list elements. For example,

lapply(lst, table)

answered Mar 16 '16 at 07:31

akrun

874,273
37
540
662

Awesome, the lst created holds the vector of string subgroups. Thank you very much! – InfiniteFlash Mar 16 '16 at 07:42
The data frame right above this comment isn't quite what I would like. It's splitting each object in the vector up and assigning each of them into a separate column. Nevertheless, the cSplit function looks potentially useful for some other task. – InfiniteFlash Mar 16 '16 at 07:47
@InfiniteFlashChess You can also use `"long"` option to convert it to 'long' format – akrun Mar 16 '16 at 07:49
It looks like direction = "long" stretches out c(" "IA,FST,FI") as "IA", "FST", "FI" in a vertical column which is only one subgroup string per row. (which is why there is 23 observations when I try your suggestion). I want to preserve the vector of strings in each row, which is what the list does nicely. Thanks a bunch for that! This link confirms what I am talking about haha http://stackoverflow.com/questions/11144519/store-vectors-as-data-frame-entries – InfiniteFlash Mar 16 '16 at 08:01

score 2 · Answer 2 · answered Mar 16 '16 at 08:06

You can split strings into multiple columns with tidyr::separate, inserting NAs where necessary:

library(tidyr)
df2 <- separate(df, Column_of_strings, c('str1', 'str2', 'str3'), sep = ',', fill = 'right')
df2

#     str1 str2 str3
#  1    NI <NA> <NA>
#  2    FA <NA> <NA>
#  3    FI <NA> <NA>
#  4   FST <NA> <NA>
#  5    FA   NI <NA>
#  6    IA   FI   IO
#  7    NI   DI <NA>
#  8    IA   NI   IO
#  9    IA   FT   FI
#  10   FA   FT   FI
#  11   IA  FST   FI

If we create a index column:

df2 <- cbind(id = seq_along(df2$str1), df2)

...then we can use reshape2::melt to put the data in long form (which is sometimes more useful than a list) and remove the NAs, while keeping all position information in id and variable columns:

library(reshape2)
melt(df2, id = 'id', na.rm = TRUE)

#     id variable value
#  1   1     str1    NI
#  2   2     str1    FA
#  3   3     str1    FI
#  4   4     str1   FST
#  5   5     str1    FA
#  6   6     str1    IA
#  7   7     str1    NI
#  8   8     str1    IA
#  9   9     str1    IA
#  10 10     str1    FA
#  11 11     str1    IA
#  16  5     str2    NI
#  17  6     str2    FI
#  18  7     str2    DI
#  19  8     str2    NI
#  20  9     str2    FT
#  21 10     str2    FT
#  22 11     str2   FST
#  28  6     str3    IO
#  30  8     str3    IO
#  31  9     str3    FI
#  32 10     str3    FI
#  33 11     str3    FI

Unpack string and get vector of strings

2 Answers2