Split a column by group

Question

I have some data that looks a little bit like this:

test.frame <- read.table(text = "name   amounts   
                                JEAN  318.5,45
                             GREGORY 1518.5,67,8
                              WALTER  518.5
                               LARRY  518.5,55,1
                               HARRY  318.5,32
                         ",header = TRUE,sep = "")

I'd like it to look more like this ...

name   amount
JEAN  318.5
JEAN 45
GREGORY 1518.5
GREGORY 67
GREGORY 8
WALTER  518.5
LARRY  518.5
LARRY  55
LARRY  1
HARRY  318.5
HARRY  32

It seems like there should be a straightforward way to break out the "amounts" column, but I'm not coming up with it. Happy to take a "RTFM page for this particular command" answer. What's the command I'm looking for?

rawr · Accepted Answer · 2014-06-16T19:12:11.263

(test.frame <- read.table(text = "name   amounts   
                                JEAN  318.5,45
                             GREGORY 1518.5,67,8
                              WALTER  518.5
                               LARRY  518.5,55,1
                               HARRY  318.5,32
                         ",header = TRUE,sep = ""))


#      name     amounts
# 1    JEAN    318.5,45
# 2 GREGORY 1518.5,67,8
# 3  WALTER       518.5
# 4   LARRY  518.5,55,1
# 5   HARRY    318.5,32

tmp <- setNames(strsplit(as.character(test.frame$amounts), 
                split = ','), test.frame$name)

data.frame(name = rep(names(tmp), sapply(tmp, length)), 
           amounts = unlist(tmp), row.names = NULL)

#       name amounts
# 1     JEAN   318.5
# 2     JEAN      45
# 3  GREGORY  1518.5
# 4  GREGORY      67
# 5  GREGORY       8
# 6   WALTER   518.5
# 7    LARRY   518.5
# 8    LARRY      55
# 9    LARRY       1
# 10   HARRY   318.5
# 11   HARRY      32

+1! I had a very similar solution but you were faster :) mine was `x <- strsplit(as.character(test.frame$amounts), ","); data.frame(name = rep(test.frame$name, sapply(x, length)), amount = unlist(x))` — talat, Jun 16 '14 at 18:15
yeah the setNames isnt necessary, just for clarity. also, your way is better for factors which is what comes about by default in the example given. It'd be nice if stringsAsFactors = FALSE was the default, but setting that option in the rprofile isn't too much effort. Just a habit by now — rawr, Jun 16 '14 at 19:03

score 6 · Answer 2 · answered Jun 16 '14 at 18:16

The fastest way (probably) will be data.table

library(data.table)
setDT(test.frame)[, lapply(.SD, function(x) unlist(strsplit(as.character(x), ','))),
                  .SDcols = "amounts", by = name]

 ##       name amounts
 ## 1:    JEAN   318.5
 ## 2:    JEAN      45
 ## 3: GREGORY  1518.5
 ## 4: GREGORY      67
 ## 5: GREGORY       8
 ## 6:  WALTER   518.5
 ## 7:   LARRY   518.5
 ## 8:   LARRY      55
 ## 9:   LARRY       1
 ## 10:  HARRY   318.5
 ## 11:  HARRY      32

+1 My [`cSplit` function](http://stackoverflow.com/a/24255502/1270695) is a generalization of this approach. — A5C1D2H2I1M1N2O1R2T1, Jun 17 '14 at 04:01

score 4 · Answer 3 · answered Jun 17 '14 at 04:00

A generalization of David Arenburg's solution would be to use my cSplit function. Get it from the Git Hub Gist (https://gist.github.com/mrdwab/11380733) or load it with "devtools":

# library(devtools)
# source_gist(11380733)

The "long" format would be what you are looking for...

cSplit(test.frame, "amounts", ",", "long")
#        name amounts
#  1:    JEAN   318.5
#  2:    JEAN      45
#  3: GREGORY  1518.5
#  4: GREGORY      67
#  5: GREGORY       8
#  6:  WALTER   518.5
#  7:   LARRY   518.5
#  8:   LARRY      55
#  9:   LARRY       1
# 10:   HARRY   318.5
# 11:   HARRY      32

But the function can create wide output formats too:

cSplit(test.frame, "amounts", ",", "wide")
#       name amounts_1 amounts_2 amounts_3
# 1:    JEAN     318.5        45        NA
# 2: GREGORY    1518.5        67         8
# 3:  WALTER     518.5        NA        NA
# 4:   LARRY     518.5        55         1
# 5:   HARRY     318.5        32        NA

One advantage with this function is being able to split multiple columns at once.

@Amanda, glad you liked it. Check out my "splitstackshape" package, where this function will eventually be housed. — A5C1D2H2I1M1N2O1R2T1, Jun 17 '14 at 15:11

score 1 · Answer 4 · answered Jun 16 '14 at 18:02

This isn't a super standard format, but here is one way you can transform your data. First, I would use stringsAsFactors=F with your read.table to make sure everything is a character variable rather than a factor. Alternatively you can do as.character() on those columns.

First I split the values in the amounts using the comma then I combine values with the names column

md <- do.call(rbind, Map(cbind, test.frame$name, 
    strsplit(test.frame$amounts, ",")))

Then I paste everything back together and send it to read.table to do the variable conversion

read.table(text=apply(md,1,paste, collapse="\t"), 
    sep="\t", col.names=names(test.frame))

Alternatively you could just make a data.frame from the md matrix and do the class conversions yourself

data.frame(names=md[,1], amount=as.numeric(md[,2]))

rrs · Answer 5 · 2014-06-16T18:20:35.957

1

Here is a plyr solution:

Split.Amounts <- function(x) {
  amounts <- unlist(strsplit(as.character(x$amounts), ","))
  return(data.frame(name = x$name, amounts = amounts, stringsAsFactors=FALSE))
}

library(plyr)

ddply(test.frame, .(name), Split.Amounts)

Using dplyr:

library(dplyr)

test.frame %>%
  group_by(name) %>%
  do(Split.Amounts(.))

edited Jun 16 '14 at 18:20

answered Jun 16 '14 at 18:15

rrs

9,615
4
28
38

Split a column by group

5 Answers5

Linked

Related