1

I have a single column (yes, just one column) with 200 rows whose elements are strings separated by commas.

Actual data in a dataframe:

"A, B, C, D"
"1, 10, 13, 4"
"0, 1, 6, 1"
"9, 3, 3, 0"
...

And from this single column I want to produce de following data frame:

A   B   C   D

1   10  13  4
0   1   6   1
9   3   3   0
     ...

Where "A", "B", "C", "D" are the column-headers for this data frame and the rows also split by comma to each of the created column respectively. How can I achieve this in R with read.table making the first row as headers?

bgly
  • 157
  • 8
  • 1
    `read.table(header = T, sep = ",", file = whateveritis)` – Kat Dec 28 '21 at 00:42
  • @Kat If I have a data frame with the column, how could I use it in read.table? – bgly Dec 28 '21 at 00:45
  • Is `"A, B, C, D"` a column header or row 1 in the data frame? – Kat Dec 28 '21 at 01:14
  • If "A, B, C, D" is in row 1, this works: `read.table(col.names = LETTERS[1:4], skip = 1, sep = ",", text = unlist(df1, use.names = F))`. If it's the header, then remove `skip = 1`. – Kat Dec 28 '21 at 01:25
  • Why did you post the same question twice? There's nothing substantively different between this and the post you made [an hour before](https://stackoverflow.com/q/70501131/5325862) – camille Dec 28 '21 at 01:39
  • You need to correct the importing process rather than importing the data incorrectly and then fixing it in R. How did you get this single column dataframe in R? What did you use? – Ronak Shah Dec 28 '21 at 02:25
  • @Kat I don't have "letters" as I used it as an example in my code, I have categories that would be the name of each column. How can I do it for those names? – bgly Dec 28 '21 at 02:30
  • You can use the function `c()` and the string names. It would look something like this: `read.table(col.names = c("nameOne", "nameTwo", "andMore"), skip = 1, sep = ",", text = unlist(df1, use.names = F))`. – Kat Dec 28 '21 at 20:15
  • @Kat but that explicitly mentions the names in col.names. I have a string and is not assured I know the names that are separated by commas beforehand – bgly Dec 28 '21 at 20:56
  • Are the concatenated strings in the first row or are they a heading of the one-column data frame? – Kat Dec 28 '21 at 21:07
  • @Kat It is a one-column dataframe, and that column has rows of a single string each. The string itself has the names separated by commas. I want to extract from that string the multiple column names which are the words separated by commas. For example, the first row of the column is "car1,car2,car3,car" and I want to read this string and extract the headers "car1", "car2", "car3", for my dataframe table. – bgly Dec 28 '21 at 21:52

2 Answers2

2

For the sake of completeness, the pretty handy fread() function from the package can be used here conveniently in a one liner:

data.table::fread(text = paste0(df1$V1, collapse = "\n"))
       A     B     C     D
   <int> <int> <int> <int>
1:     1    10    13     4
2:     0     1     6     1
3:     9     3     3     0

The only preparation required is to collapse the one column data.frame df1 into a character vector with multiple lines by calling paste0(df1$V1, collapse = "\n").

fread() reads the column headers from the first line by default.

Also by default, fread() returns a data.table object but can be told to return a data.frame as well:

data.table::fread(text = paste0(df1$V1, collapse = "\n"), data.table = FALSE)
  A  B  C D
1 1 10 13 4
2 0  1  6 1
3 9  3  3 0

Data

df1 <- data.frame(V1 = c("A, B, C, D", 
                         "1, 10, 13, 4", 
                         "0, 1, 6, 1",
                         "9, 3, 3, 0")))

So, df1 contains only one character column V1.

Uwe
  • 41,420
  • 11
  • 90
  • 134
1

Here are several different ways to extract and use the data sticking with read.table().

I started with two fake sets of data. In one with nothing of value in the column name (real column name is in row one).

df1 <- data.frame("V1" = c("A,B,C,D", 
                           "AA,D,E,F3", 
                           "Car1,Car2,Car3,Car4",
                           "a,b,c,d",
                           "a1,b1,c1,d1"))
#                    V1
# 1             A,B,C,D
# 2           AA,D,E,F3
# 3 Car1,Car2,Car3,Car4
# 4             a,b,c,d
# 5         a1,b1,c1,d1 

In the other, the string that is listed as the column name is a list of the would-be names.

df2 <- data.frame("A,B,C,D" = c("AA,D,E,F3", 
                                "Car1,Car2,Car3,Car4",
                                "a,b,c,d",
                                "a1,b1,c1,d1"), 
                  check.names = F)
#               A,B,C,D
# 1           AA,D,E,F3
# 2 Car1,Car2,Car3,Car4
# 3             a,b,c,d
# 4         a1,b1,c1,d1 

To extract the names and values delimited by a comma, where the would-be headings are in row 1 (using df1).

# single data.frame with headers concatenated in the first row
df.noHeader <- read.table(col.names = unlist(strsplit(df1[1,], 
                                                      split = "[,]")),
                          sep = ",",
                          skip = 1, # since the headers were in row 1
                          text = unlist(df1, use.names = F)) 
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1 

For clarity, this is what works for when the names are in the column name of the original data frame.

# splitting the original header when splitting the data
df.header <- read.table(col.names = unlist(strsplit(names(df2), 
                                                    split = "[,]")),
                        sep = ",", 
                        text = unlist(df2))
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1 

If you had the headings in some other row, you only need to change the value in the call to strsplit(), like this:

# if the headers were in row 2
df.noHeader <- read.table(col.names = unlist(strsplit(df1[2,], # <- see 2 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 2,  # since the headers were in row 2
                          text = unlist(df1, use.names = F))
#     AA    D    E   F3
# 1 Car1 Car2 Car3 Car4
# 2    a    b    c    d
# 3   a1   b1   c1   d1 

# if the headers were in row 3
df.noHeader <- read.table(col.names = unlist(strsplit(df1[3,], # <- see 3 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 3, # since the headers were in row 3
                          text = unlist(df1, use.names = F))
#   Car1 Car2 Car3 Car4
# 1    a    b    c    d
# 2   a1   b1   c1   d1 
Kat
  • 15,669
  • 3
  • 18
  • 51