2

I have a data.table:

library(data.table)
p1 = data.table(a = c(10.34,25.87,53.2), b=c(15.3,183.2,34.8))
print(p1)

       a     b
1: 10.34  15.3
2: 25.87 183.2
3: 53.20  34.8

What I would like to get is a new data.table with the following structure:

       a     b     a1    b1     a2    b2     a3    b3     
1: 10.34  15.3  10.34  15.3  25.87 183.2   53.2  34.8
2: 25.87 183.2  10.34  15.3  25.87 183.2   53.2  34.8
3: 53.20  34.8  10.34  15.3  25.87 183.2   53.2  34.8

My current solution is:

p2 = cbind(p,p[1,],p[2,],p[3,])

How do I create a similar (other than using for loops) data.table p2 with 10001 columns when I have input data.table p with 10000 rows?

Any help is appreciated.

Frank
  • 66,179
  • 8
  • 96
  • 180
eod
  • 449
  • 3
  • 17
  • But why do you want to do that? It seems a bad idea, needlessly wasting memory for **10,000 duplicate columns merely containing the multiples `(n*a, n*b) for n in 1:5000`. Tell us what you are actually trying to solve**, what happens next to this datatable p1? (This feels like an [XY problem](https://meta.stackoverflow.com/questions/tagged/xy-problem)) – smci Mar 24 '19 at 03:11
  • 1
    ...but fundamentally, since we know you merely want replicated columns `n*a, n*b` with a multiplier n for n in 1:5000, we can just roll that implicit knowledge into your next calculation (tell us what it is), e.g. pass in that multiplier as a vector. I suspect you don't need 10,000 duplicate columns at all. – smci Mar 24 '19 at 03:19
  • Hi, it is an input to a function. Maybe the faster way would be to create a dataframe with a predefined dimension and then assign values to them? – eod Mar 25 '19 at 13:59
  • Then refactor your function to not require 1:5000 copies of the input columns, already. Which function is it? We really need to see that function code to understand why you think this duplication is necessary. Unless it's some external library/package/API. – smci Apr 10 '19 at 06:09
  • It is an external function, distHaversine(). I'm using it as in the first answer on this page: https://stackoverflow.com/questions/34213765/using-the-geosphere-distm-function-on-a-data-table-to-calculate-distances . – eod Apr 11 '19 at 13:59

4 Answers4

3

Here is another option using rbindlist and cbind on rep for transposed data frame.

library(data.table)

cbind(p1, rbindlist(rep(list(data.table(t(unlist(p1)))), times = nrow(p1))))
#        a     b    a1    a2   a3   b1    b2   b3
# 1: 10.34  15.3 10.34 25.87 53.2 15.3 183.2 34.8
# 2: 25.87 183.2 10.34 25.87 53.2 15.3 183.2 34.8
# 3: 53.20  34.8 10.34 25.87 53.2 15.3 183.2 34.8

Update

@Frank pointed out in the comments that cbind can take unequal row numbers of two data frames. In this case, the data frame with less row numbers would be "recycled". So we don't need rep or rbindlist and below is the updated code.

cbind(p1, data.table(t(unlist(p1))))
#        a     b    a1    a2   a3   b1    b2   b3
# 1: 10.34  15.3 10.34 25.87 53.2 15.3 183.2 34.8
# 2: 25.87 183.2 10.34 25.87 53.2 15.3 183.2 34.8
# 3: 53.20  34.8 10.34 25.87 53.2 15.3 183.2 34.8

To get the col order as desired by OP, one option is setcolorder:

cbind(p1, setcolorder(data.table(t(unlist(p1))), order(row(p1))) )    
#        a     b    a1   b1    a2    b2   a3   b3
# 1: 10.34  15.3 10.34 15.3 25.87 183.2 53.2 34.8
# 2: 25.87 183.2 10.34 15.3 25.87 183.2 53.2 34.8
# 3: 53.20  34.8 10.34 15.3 25.87 183.2 53.2 34.8
Frank
  • 66,179
  • 8
  • 96
  • 180
www
  • 38,575
  • 12
  • 48
  • 84
  • 1
    @Frank Good to know. If we don't need to do `rep`, and then we don't need `rbindlist` as the purpose of this is to combine multiple data frames. So, this also works! `cbind(p1, data.table(t(unlist(p1))))` – www Mar 22 '19 at 16:51
  • 1
    @Frank If you don't mind, I am going to update our comments to my answer. It is good to know the behavior of `cbind`. Thanks! – www Mar 22 '19 at 16:54
  • 1
    @eod I've edited one way in. Not sure if there are others. – Frank Mar 22 '19 at 18:24
  • 1
    @Frank Good approach. Thanks for the help. – www Mar 22 '19 at 18:29
2

We can use shift

out <- cbind(p1, p1[, shift(.SD, type = 'lead',
             n = c(0, seq_len(.N-1)))][rep(1, nrow(p1))])
setnames(out, make.unique(c(names(p1), rep(names(p1), each = nrow(p1)))))

or with tidyverse

library(tidyverse)
pmap_dfc(p1, list) %>% 
             uncount(nrow(p1))

If we need the original data as well

pmap_dfc(p1, list) %>%
   rowr::cbind.fill(p1, .)
#     a     b     a    b    a1    b1   a2   b2
#1 10.34  15.3 10.34 15.3 25.87 183.2 53.2 34.8
#2 25.87 183.2 10.34 15.3 25.87 183.2 53.2 34.8
#3 53.20  34.8 10.34 15.3 25.87 183.2 53.2 34.8

Or with transpose and bind_cols

purrr::transpose(p1) %>% 
    bind_cols %>% 
    rowr::cbind.fill(p1, .)
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here's another option, similar to www's:

> cbind(p1, matrix(rep(unlist(p1), nrow(p1)), nrow = nrow(p1), byrow=T))
       a     b    V1    V2   V3   V4    V5   V6
1: 10.34  15.3 10.34 25.87 53.2 15.3 183.2 34.8
2: 25.87 183.2 10.34 25.87 53.2 15.3 183.2 34.8
3: 53.20  34.8 10.34 25.87 53.2 15.3 183.2 34.8
C. Braun
  • 5,061
  • 19
  • 47
2
cbind(p1, do.call(cbind, split(p1, 1:nrow(p1))))

#        a     b   1.a  1.b   2.a   2.b  3.a  3.b
# 1: 10.34  15.3 10.34 15.3 25.87 183.2 53.2 34.8
# 2: 25.87 183.2 10.34 15.3 25.87 183.2 53.2 34.8
# 3: 53.20  34.8 10.34 15.3 25.87 183.2 53.2 34.8
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38