Merge multiple columns

Question

MY DATA
I have a data.frame Movement that contains X and Y column with positional data that belongs to a subject (ID). There are 300+ subjects in my dataset, imported in R from a .csv file. The following is a (small) example of my data.

X1 <- c(10.83, 11.91, 12.32)
Y1 <- c(25.26, 27.81, 27.96)
ID1 <- c("John", "John", "John")
X2 <- c(14.73, 15.95, 15.97)
Y2 <- c(21.29, 22.83, 23.36)
ID2 <- c("Tom", "Tom", "Tom")

WHAT I WISH FOR IT TO LOOK LIKE
Merge all the columns into three and have Movement be:

X <- c(10.83, 11.91, 12.32,14.73, 15.95, 15.97)
Y <- c(25.26, 27.81, 27.96, 21.29, 22.83, 23.36)
ID <- c("John", "John", "John", "Tom", "Tom", "Tom")

WHAT I HAVE TRIED
Merge and stack, although this combine my X and Y values into one Value which I do not want. I feel it is a simple question but I am stuck, any assistance please?

Thank you.

Why don't you just treat the first three columns as one data frame, and the last three columns as another, then just `rbind` them? i.e. something like `rbind(Movement[,1:3], Movement[,4:6])`, taking into consideration http://stackoverflow.com/questions/19297475/simplest-way-to-get-rbind-to-ignore-column-names — Alex, Mar 19 '15 at 03:49
Just to add to Alex's comment, if the names are not the same, change the names of the first 3 columns to match the other 3 columns or viceversa for `rbind` to work. or just use `rbindlist` which will work fine without changing the names. i.e. `library(data.table); rbindlist(list(Movement[1:3], Movement[4:6]))` — akrun, Mar 19 '15 at 03:52
The data frame is imported as a .csv file that contains 300+ subjects and their X and Y data so it would be a very long process to do it this way. I have updated the question to reflect this. — user2716568, Mar 19 '15 at 03:55

Alex · Answer 1 · 2015-03-19T04:05:45.253

Using both my comment and Akrun's

library(data.table)

Movement <- data.frame(X1 = c(10.83, 11.91, 12.32),
                       Y1 = c(25.26, 27.81, 27.96),
                       ID1 = c("John", "John", "John"), 
                       X2 = c(14.73, 15.95, 15.97), 
                       Y2 = c(21.29, 22.83, 23.36), 
                       ID2 = c("Tom", "Tom", "Tom"))

rbindlist(list(Movement[,1:3], Movement[,4:6]))

gives

      X1    Y1  ID1
1: 10.83 25.26 John
2: 11.91 27.81 John
3: 12.32 27.96 John
4: 14.73 21.29  Tom
5: 15.95 22.83  Tom
6: 15.97 23.36  Tom

It is easy enough to then rename the columns to be what you want.

Assuming that movement consists of 100 triplets (Xn, Yn, IDn), then an intermediate step is to construct a list of individual data frames. You can do this using

list_of_df <- lapply(1:100, function(j) Movement[,(3*j-2):(3*j)])

Then, rbindlist(list_of_df)

akrun · Accepted Answer · 2015-03-19T06:40:15.543

If there are 300+ columns, one option is to get the index of 'ID', 'X', and 'Y' columns using grep, unlist the subset of the dataset and create the three columns using the data.frame.

IndID <-  grep('^ID', names(Movement))
IndX <-   grep('^X', names(Movement))
IndY <-    grep('^Y', names(Movement))

res <-  data.frame(ID=unlist(Movement[IndID]), 
         X=unlist(Movement[IndX]), 
         Y=unlist(Movement[IndY]), 
            stringsAsFactors=FALSE)
row.names(res) <- NULL 
res
#    ID     X     Y
#1 John 10.83 25.26
#2 John 11.91 27.81
#3 John 12.32 27.96
#4  Tom 14.73 21.29
#5  Tom 15.95 22.83
#6  Tom 15.97 23.36

Or you could use melt from the devel version of data.table (i.e. v.1.9.5). It can be installed from here

melt(setDT(Movement), measure=list(IndX, IndY, IndID),
     value.name=c('X', 'Y', 'ID'))[,variable:= NULL][]
#      X     Y   ID
#1: 10.83 25.26 John
#2: 11.91 27.81 John
#3: 12.32 27.96 John
#4: 14.73 21.29  Tom
#5: 15.95 22.83  Tom
#6: 15.97 23.36  Tom

Or merged.stack from splitstackshape.

 library(splitstackshape)
 library(data.table)
 merged.stack(setDT(Movement, keep.rownames=TRUE),
     var.stubs=c('X', 'Y', 'ID'), sep='var.stubs')[, 1:2 := NULL][] 
 #      X     Y   ID
 #1: 10.83 25.26 John
 #2: 14.73 21.29  Tom
 #3: 11.91 27.81 John
 #4: 15.95 22.83  Tom
 #5: 12.32 27.96 John
 #6: 15.97 23.36  Tom

data

Movement <- structure(list(X1 = c(10.83, 11.91, 12.32), Y1 = c(25.26,
27.81, 27.96), ID1 = structure(c(1L, 1L, 1L), .Label = "John",
class = "factor"), 
X2 = c(14.73, 15.95, 15.97), Y2 = c(21.29, 22.83, 23.36), 
ID2 = structure(c(1L, 1L, 1L), .Label = "Tom", class = "factor")),
.Names = c("X1",  
"Y1", "ID1", "X2", "Y2", "ID2"), row.names = c(NA, -3L),
class = "data.frame")

Excellent, I successfully used `grep` to easily create the data frame. Thank you! — user2716568, Mar 20 '15 at 01:22
To clarify, do all columns need to be of the same row length? — user2716568, Feb 17 '16 at 05:51
@user2716568 If you need a `data.frame`, the columns should be of the same length. If it is not, you can store in a `list`. I am not sure I understand you properly. This question was asked some time back. — akrun, Feb 17 '16 at 05:53
Thank you for the quick response. Yes, a `data.frame` is what I am after. Currently, I am using your `grep` and `data.frame` example to compile a new dataset that is structured in the same manner as above (same column headers, multiple columns). The code is not working and it may be because there are a different number of rows for each entry. I can open this as a new question, if you consider appropriate. — user2716568, Feb 17 '16 at 05:58
@user2716568 It would be better to open as a new question with the expected output — akrun, Feb 17 '16 at 05:58

score 1 · Answer 3 · answered Mar 19 '15 at 04:05

Either of these would succeed:

> reshape(Moment, direction="long",  sep="", varying=list("X"=c(1,4),"Y"=c(2,5),"ID"=c(3,6)))
    PT time    X1    Y1  ID1 id
1.1  1    1 10.83 25.26 John  1
2.1  2    1 11.91 27.81 John  2
3.1  3    1 12.32 27.96 John  3
4.1  4    1 10.83 25.26 John  4
5.1  5    1 11.91 27.81 John  5
6.1  6    1 12.32 27.96 John  6
1.2  1    2 14.73 21.29  Tom  1
2.2  2    2 15.95 22.83  Tom  2
3.2  3    2 15.97 23.36  Tom  3
4.2  4    2 14.73 21.29  Tom  4
5.2  5    2 15.95 22.83  Tom  5
6.2  6    2 15.97 23.36  Tom  6

> reshape(Moment, direction="long",  sep="", varying=1:6, v.names=c("X","Y","ID"))
    PT time    X     Y    ID id
1.1  1    1 John 10.83 25.26  1
2.1  2    1 John 11.91 27.81  2
3.1  3    1 John 12.32 27.96  3
4.1  4    1 John 10.83 25.26  4
5.1  5    1 John 11.91 27.81  5
6.1  6    1 John 12.32 27.96  6
1.2  1    2  Tom 14.73 21.29  1
2.2  2    2  Tom 15.95 22.83  2
3.2  3    2  Tom 15.97 23.36  3
4.2  4    2  Tom 14.73 21.29  4
5.2  5    2  Tom 15.95 22.83  5
6.2  6    2  Tom 15.97 23.36  6

(I had put an id-variable in my test opbject but it wasn't needed:

Moment <- data.frame(X1 = c(10.83, 11.91, 12.32),
Y1  =c(25.26, 27.81, 27.96),
ID1= c("John", "John", "John"),
X2 =c(14.73, 15.95, 15.97),
Y2 = c(21.29, 22.83, 23.36),
ID2 = c("Tom", "Tom", "Tom") )

reshape(Moment, direction="long",  sep="", varying=1:6, v.names=c("X","Y","ID"))
    time    X     Y    ID id
1.1    1 John 10.83 25.26  1
2.1    1 John 11.91 27.81  2
3.1    1 John 12.32 27.96  3
1.2    2  Tom 14.73 21.29  1
2.2    2  Tom 15.95 22.83  2
3.2    2  Tom 15.97 23.36  3

Merge multiple columns

3 Answers3

data