Split strings by commas into columns

Question

I have a single column (yes, just one column) with 200 rows whose elements are strings separated by commas.

Actual data in a dataframe:

"A, B, C, D"
"1, 10, 13, 4"
"0, 1, 6, 1"
"9, 3, 3, 0"
...

And from this single column I want to produce de following data frame:

A   B   C   D

1   10  13  4
0   1   6   1
9   3   3   0
     ...

Where "A", "B", "C", "D" are the column-headers for this data frame and the rows also split by comma to each of the created column respectively. How can I achieve this in R with read.table making the first row as headers?

@Kat If I have a data frame with the column, how could I use it in read.table? — bgly, Dec 28 '21 at 00:45
Is `"A, B, C, D"` a column header or row 1 in the data frame? — Kat, Dec 28 '21 at 01:14
If "A, B, C, D" is in row 1, this works: `read.table(col.names = LETTERS[1:4], skip = 1, sep = ",", text = unlist(df1, use.names = F))`. If it's the header, then remove `skip = 1`. — Kat, Dec 28 '21 at 01:25
Why did you post the same question twice? There's nothing substantively different between this and the post you made [an hour before](https://stackoverflow.com/q/70501131/5325862) — camille, Dec 28 '21 at 01:39
You need to correct the importing process rather than importing the data incorrectly and then fixing it in R. How did you get this single column dataframe in R? What did you use? — Ronak Shah, Dec 28 '21 at 02:25
@Kat I don't have "letters" as I used it as an example in my code, I have categories that would be the name of each column. How can I do it for those names? — bgly, Dec 28 '21 at 02:30
You can use the function `c()` and the string names. It would look something like this: `read.table(col.names = c("nameOne", "nameTwo", "andMore"), skip = 1, sep = ",", text = unlist(df1, use.names = F))`. — Kat, Dec 28 '21 at 20:15
@Kat but that explicitly mentions the names in col.names. I have a string and is not assured I know the names that are separated by commas beforehand — bgly, Dec 28 '21 at 20:56
Are the concatenated strings in the first row or are they a heading of the one-column data frame? — Kat, Dec 28 '21 at 21:07
@Kat It is a one-column dataframe, and that column has rows of a single string each. The string itself has the names separated by commas. I want to extract from that string the multiple column names which are the words separated by commas. For example, the first row of the column is "car1,car2,car3,car" and I want to read this string and extract the headers "car1", "car2", "car3", for my dataframe table. — bgly, Dec 28 '21 at 21:52

Uwe · Answer 1 · 2022-01-08T01:11:30.120

For the sake of completeness, the pretty handy fread() function from the data.table package can be used here conveniently in a one liner:

data.table::fread(text = paste0(df1$V1, collapse = "\n"))

       A     B     C     D
   <int> <int> <int> <int>
1:     1    10    13     4
2:     0     1     6     1
3:     9     3     3     0

The only preparation required is to collapse the one column data.frame df1 into a character vector with multiple lines by calling paste0(df1$V1, collapse = "\n").

fread() reads the column headers from the first line by default.

Also by default, fread() returns a data.table object but can be told to return a data.frame as well:

data.table::fread(text = paste0(df1$V1, collapse = "\n"), data.table = FALSE)

  A  B  C D
1 1 10 13 4
2 0  1  6 1
3 9  3  3 0

Data

df1 <- data.frame(V1 = c("A, B, C, D", 
                         "1, 10, 13, 4", 
                         "0, 1, 6, 1",
                         "9, 3, 3, 0")))

So, df1 contains only one character column V1.

score 1 · Accepted Answer · answered Dec 28 '21 at 22:43

Here are several different ways to extract and use the data sticking with read.table().

I started with two fake sets of data. In one with nothing of value in the column name (real column name is in row one).

df1 <- data.frame("V1" = c("A,B,C,D", 
                           "AA,D,E,F3", 
                           "Car1,Car2,Car3,Car4",
                           "a,b,c,d",
                           "a1,b1,c1,d1"))
#                    V1
# 1             A,B,C,D
# 2           AA,D,E,F3
# 3 Car1,Car2,Car3,Car4
# 4             a,b,c,d
# 5         a1,b1,c1,d1

In the other, the string that is listed as the column name is a list of the would-be names.

df2 <- data.frame("A,B,C,D" = c("AA,D,E,F3", 
                                "Car1,Car2,Car3,Car4",
                                "a,b,c,d",
                                "a1,b1,c1,d1"), 
                  check.names = F)
#               A,B,C,D
# 1           AA,D,E,F3
# 2 Car1,Car2,Car3,Car4
# 3             a,b,c,d
# 4         a1,b1,c1,d1

To extract the names and values delimited by a comma, where the would-be headings are in row 1 (using df1).

# single data.frame with headers concatenated in the first row
df.noHeader <- read.table(col.names = unlist(strsplit(df1[1,], 
                                                      split = "[,]")),
                          sep = ",",
                          skip = 1, # since the headers were in row 1
                          text = unlist(df1, use.names = F)) 
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1

For clarity, this is what works for when the names are in the column name of the original data frame.

# splitting the original header when splitting the data
df.header <- read.table(col.names = unlist(strsplit(names(df2), 
                                                    split = "[,]")),
                        sep = ",", 
                        text = unlist(df2))
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1

If you had the headings in some other row, you only need to change the value in the call to strsplit(), like this:

# if the headers were in row 2
df.noHeader <- read.table(col.names = unlist(strsplit(df1[2,], # <- see 2 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 2,  # since the headers were in row 2
                          text = unlist(df1, use.names = F))
#     AA    D    E   F3
# 1 Car1 Car2 Car3 Car4
# 2    a    b    c    d
# 3   a1   b1   c1   d1 

# if the headers were in row 3
df.noHeader <- read.table(col.names = unlist(strsplit(df1[3,], # <- see 3 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 3, # since the headers were in row 3
                          text = unlist(df1, use.names = F))
#   Car1 Car2 Car3 Car4
# 1    a    b    c    d
# 2   a1   b1   c1   d1

Split strings by commas into columns

2 Answers2

Data

Linked