tidyr or dplyr equivalent of JMP split table

Question

JMP has a "split table" platform:
http://www.jmp.com/support/help/Split_Columns.shtml

Here is the image for it:

The "split by" becomes part of the column headers.
The "split columns" are the columns spread out.
The "group" are retained columns.

I have looked at a few links/pages and can't seem to get this right in R. Right now I have to kluge it into a macro in JMP.

Links that didn't help me include:

I need to split a table of ~20k rows and ~30 columns, along one of the columns (integers between 0 and 13), to being ~1400 rows with ~25 split into 350.

An inelegant, but repeatable, example is splitting this cars table

according to this:

Yields this:

How do I do this and retain the ~5 non-split columns using an R library like tidyr or dplyr?

Could you provide a small (table) example of the input and the expected output? — CPak, Jun 12 '17 at 17:40
Even with your expected output, it's not clear to me how multiple values should be stored. For example, what is the desired output for `Model 70` when `Origin = Japanese` and `Cylinders = 4`? — bouncyball, Jun 12 '17 at 17:51
Sorry for inelegance. I have two columns in the "split apart" bin: model and displacement. These are split along Year, so for each unique value of year, there is a column created where the prefix is "model" and the suffix is year". The years range from 70 to 82, so there are 13 "model xx" columns. The same goes for "Engine Disp xx". For each "group by" the unique element is placed in the cell. My stuff has unique 1:1, this example doesn't. So for each of unique of origin, for each of unique of cylinders|origin, populate "model xx" and "engine disp xx". — EngrStudent, Jun 12 '17 at 17:58

ARobertson · Accepted Answer · 2017-06-12T21:34:32.140

Using reshape, it's not too terrible to do one split column at a time. You could then merge the model and engine.disp together. For your real example, you could just change the lists in aggregate and formula in cast.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

agg <- aggregate(list(model = x$model),list(origin = x$origin,cylinders = x$cylinders,year = x$year),FUN = paste,collapse = ',')

require(reshape)
output <- cast(data = agg,formula = origin + cylinders ~ year,value = 'model')

Edit: I haven't checked all possible cases, but this function should work similar to the split tables, or at least give you a good start.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

jmpsplitcol <- function(data,splitby,splitcols,group){

  require(reshape)
  require(tidyr)

  aggsplitlist <- data[ ,names(data) %in% c(splitby,group)]
  aggsplitlist <- lapply(aggsplitlist,`[`)

  agg <- aggregate(list(data[ ,names(data) %in% splitcols]),aggsplitlist,FUN = paste,collapse = ',')

  newgat <- gather_(data = agg,key = 'splitcolname','myval',splitcols)

  castformula <- as.formula(paste(paste(group,collapse = ' + '),'~','splitcolname','+',splitby))
  output <- cast(data = newgat,formula = castformula,value = 'myval')
  output
}
res <- jmpsplitcol(x,c('year'),c('engine.disp','model'),c('origin','cylinders'))
head(res2)

So my actual data is ~30krows by ~30 cols, and my output is ~2200 rows and ~350 cols. i have ~150 such files to split. I was hoping it would be much much much faster using the tidyr and "apply paradigm" but it is only much faster. :) — EngrStudent, Jul 05 '17 at 18:11

tidyr or dplyr equivalent of JMP split table

1 Answers1