Questions tagged [splitstackshape]

Use the splitstackshape R package to stack and reshape datasets after splitting concatenated values

Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split (cSplit) family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"---something which reshape (from base R) does not handle, and which melt and dcast from do not easily handle.

The package has as a dependency and some of its functions return data.tables.

CRAN Documentation

Main Website

60 questions
8
votes
5 answers

Splitting a single column into multiple observation using R

I am working on HCUP data and this has range of values in one single column that needs to be split into multiple columns. Below is the HCUP data frame for reference : code label 61000-61003 excision of CNS 0169T-0169T ventricular…
x1carbon
  • 287
  • 1
  • 15
8
votes
2 answers

cSplit library(splitstackshape) is always dropping the column

I was searching for a way to split the column content by a separator and converting a table into a long format. I found cSplit from the splitstackshape package and it is almost doing what I was looking for. Problem is now with the drop option. I…
drmariod
  • 11,106
  • 16
  • 64
  • 110
8
votes
2 answers

Using sep = "." in `fread` from "data.table"

Can fread from "data.table" be forced to successfully use "." as a sep value? I'm trying to use fread to speed up my concat.split functions in "splitstackshape". See this Gist for the general approach I'm taking, and this question for why I want to…
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
5
votes
2 answers

Stratified data splitting in R

I've been using caret::createDataPartition() in order to split the data in a stratified way. Now I'm trying another approach that I found here in stack, which is splitstackshape::stratified(), and the reason I'm intrested in this is that it allows…
Programming Noob
  • 1,232
  • 3
  • 14
5
votes
2 answers

How to prevent data.table to force numeric variables into character variables without manually specifying these?

Consider the following dataset: dt <- structure(list(lllocatie = structure(c(1L, 6L, 2L, 4L, 3L), .Label = c("Assen", "Oosterwijtwerd", "Startenhuizen", "t-Zandt", "Tjuchem", "Winneweer"), class = "factor"), lat = c(52.992, 53.32,…
Jaap
  • 81,064
  • 34
  • 182
  • 193
4
votes
3 answers

Split multiple columns into rows

I'm working with a very raw set of data and need to shape it up in order to work with it. I am trying to split selected columns based on seperator '|' d <- data.frame(id = c(022,565,893,415), name = c('c|e','m|q','w','w|s|e'), score =…
Davis
  • 466
  • 4
  • 20
4
votes
2 answers

R cSplit only using first delimiter in string

I had a long list with two columns where the I had the same string in each column in multiple rows. So I used paste to concatenate using - and then used setDT to return the unique set of concats with their frequency. Now I want to reverse my…
Oli
  • 532
  • 1
  • 5
  • 26
4
votes
2 answers

split dataframe with multiple delimiters in R

df1 <- Gene GeneLocus CPA1|1357 chr7:130020290-130027948:+ GUCY2D|3000 chr17:7905988-7923658:+ UBC|7316 chr12:125396194-125399577:- C11orf95|65998 chr11:63527365-63536113:- …
Kryo
  • 921
  • 9
  • 24
4
votes
1 answer

Splitting text to words with R and cSplit()

I'm trying to split a series of sentences into separate words, that is, to tokenize the text. I have found an R package splitstackshape that is able to do what I want, well almost... it truncates the output to the first and last 5 rows. Anyway, this…
Joshua
  • 722
  • 12
  • 27
4
votes
2 answers

Project Euler #22, off by 158,055

I'm currently working through Project Euler problem 22 which has the following challenge: Using names.txt (right click and 'Save Link/Target As...'), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical…
rmbaughman
  • 921
  • 1
  • 7
  • 17
3
votes
4 answers

Splitting concatenated column and populating corresponding columns with values

I have a nasty data table that has a couple of different kinds of messiness, and I can't figure out how to combine some of the other answers that use the tidyr and splitstackshape packages. subject <- c("A", "B", "C") review <- c("Bill: [1.0]",…
bikeclub
  • 369
  • 2
  • 10
3
votes
2 answers

Modification of cSplit_e function to account for multiple values

I understand that "cSplit_e" in "splitstackshape" can be used to convert multiple values under one column to separate columns with binary values. I am dealing with a text problem for calculating tf-idf and it is not necassary to have all unique…
syebill
  • 543
  • 6
  • 23
3
votes
1 answer

How do I convert a 2x2 contingency table into a long format dataframe?

How do I convert a 2x2 contingency table into a long format data frame? I tried this: library(reshape2) Table <- matrix(c(7,67,19,71), 2, 2, byrow=TRUE) rownames(Table) <- c('Drug', 'No_Drug') colnames(Table) <- c('Comp', 'No_Comp') melt(Table) I…
FTF
  • 181
  • 3
  • 11
3
votes
1 answer

Combining irrelevant/similar observations into one (others)

After performing a survey on perceived problems per neighborhood I get this dataframe. Since the survey had different options to choose from + an open one, the results on the open question are frequently irrelevant (see…
ccamara
  • 1,141
  • 1
  • 12
  • 32
3
votes
1 answer

How can I reshape a data.table (long into wide) without doing a function like sum or mean?

How can I reshape a data.table (long into wide) without doing a function like sum or mean? I was looking at dcast/melt/reshape/etc. But I don't get the desired results. This is my data: DT <- data.table(id = c("1","1","2","3"), score = c("5", "4",…
peter
  • 93
  • 6
1
2 3 4