r extend data.table columns by filling row values

Question

I have a data.table:

library(data.table)
p1 = data.table(a = c(10.34,25.87,53.2), b=c(15.3,183.2,34.8))
print(p1)

       a     b
1: 10.34  15.3
2: 25.87 183.2
3: 53.20  34.8

What I would like to get is a new data.table with the following structure:

       a     b     a1    b1     a2    b2     a3    b3     
1: 10.34  15.3  10.34  15.3  25.87 183.2   53.2  34.8
2: 25.87 183.2  10.34  15.3  25.87 183.2   53.2  34.8
3: 53.20  34.8  10.34  15.3  25.87 183.2   53.2  34.8

My current solution is:

p2 = cbind(p,p[1,],p[2,],p[3,])

How do I create a similar (other than using for loops) data.table p2 with 10001 columns when I have input data.table p with 10000 rows?

Any help is appreciated.

But why do you want to do that? It seems a bad idea, needlessly wasting memory for **10,000 duplicate columns merely containing the multiples `(n*a, n*b) for n in 1:5000`. Tell us what you are actually trying to solve**, what happens next to this datatable p1? (This feels like an [XY problem](https://meta.stackoverflow.com/questions/tagged/xy-problem)) — smci, Mar 24 '19 at 03:11
...but fundamentally, since we know you merely want replicated columns `n*a, n*b` with a multiplier n for n in 1:5000, we can just roll that implicit knowledge into your next calculation (tell us what it is), e.g. pass in that multiplier as a vector. I suspect you don't need 10,000 duplicate columns at all. — smci, Mar 24 '19 at 03:19
Hi, it is an input to a function. Maybe the faster way would be to create a dataframe with a predefined dimension and then assign values to them? — eod, Mar 25 '19 at 13:59
Then refactor your function to not require 1:5000 copies of the input columns, already. Which function is it? We really need to see that function code to understand why you think this duplication is necessary. Unless it's some external library/package/API. — smci, Apr 10 '19 at 06:09
It is an external function, distHaversine(). I'm using it as in the first answer on this page: https://stackoverflow.com/questions/34213765/using-the-geosphere-distm-function-on-a-data-table-to-calculate-distances . — eod, Apr 11 '19 at 13:59

score 3 · Accepted Answer · edited Mar 22 '19 at 18:23

Here is another option using rbindlist and cbind on rep for transposed data frame.

library(data.table)

cbind(p1, rbindlist(rep(list(data.table(t(unlist(p1)))), times = nrow(p1))))
#        a     b    a1    a2   a3   b1    b2   b3
# 1: 10.34  15.3 10.34 25.87 53.2 15.3 183.2 34.8
# 2: 25.87 183.2 10.34 25.87 53.2 15.3 183.2 34.8
# 3: 53.20  34.8 10.34 25.87 53.2 15.3 183.2 34.8

Update

@Frank pointed out in the comments that cbind can take unequal row numbers of two data frames. In this case, the data frame with less row numbers would be "recycled". So we don't need rep or rbindlist and below is the updated code.

cbind(p1, data.table(t(unlist(p1))))
#        a     b    a1    a2   a3   b1    b2   b3
# 1: 10.34  15.3 10.34 25.87 53.2 15.3 183.2 34.8
# 2: 25.87 183.2 10.34 25.87 53.2 15.3 183.2 34.8
# 3: 53.20  34.8 10.34 25.87 53.2 15.3 183.2 34.8

To get the col order as desired by OP, one option is setcolorder:

cbind(p1, setcolorder(data.table(t(unlist(p1))), order(row(p1))) )    
#        a     b    a1   b1    a2    b2   a3   b3
# 1: 10.34  15.3 10.34 15.3 25.87 183.2 53.2 34.8
# 2: 25.87 183.2 10.34 15.3 25.87 183.2 53.2 34.8
# 3: 53.20  34.8 10.34 15.3 25.87 183.2 53.2 34.8

@Frank Good to know. If we don't need to do `rep`, and then we don't need `rbindlist` as the purpose of this is to combine multiple data frames. So, this also works! `cbind(p1, data.table(t(unlist(p1))))` — www, Mar 22 '19 at 16:51
@Frank If you don't mind, I am going to update our comments to my answer. It is good to know the behavior of `cbind`. Thanks! — www, Mar 22 '19 at 16:54

akrun · Answer 2 · 2019-03-22T17:24:18.370

We can use shift

out <- cbind(p1, p1[, shift(.SD, type = 'lead',
             n = c(0, seq_len(.N-1)))][rep(1, nrow(p1))])
setnames(out, make.unique(c(names(p1), rep(names(p1), each = nrow(p1)))))

or with tidyverse

library(tidyverse)
pmap_dfc(p1, list) %>% 
             uncount(nrow(p1))

If we need the original data as well

pmap_dfc(p1, list) %>%
   rowr::cbind.fill(p1, .)
#     a     b     a    b    a1    b1   a2   b2
#1 10.34  15.3 10.34 15.3 25.87 183.2 53.2 34.8
#2 25.87 183.2 10.34 15.3 25.87 183.2 53.2 34.8
#3 53.20  34.8 10.34 15.3 25.87 183.2 53.2 34.8

Or with transpose and bind_cols

purrr::transpose(p1) %>% 
    bind_cols %>% 
    rowr::cbind.fill(p1, .)

for some reason, b.1 and b.3 have NA as a solution. – eod Mar 22 '19 at 16:41 — eod, Mar 22 '19 at 16:41

score 2 · Answer 3 · answered Mar 22 '19 at 16:43

Here's another option, similar to www's:

> cbind(p1, matrix(rep(unlist(p1), nrow(p1)), nrow = nrow(p1), byrow=T))
       a     b    V1    V2   V3   V4    V5   V6
1: 10.34  15.3 10.34 25.87 53.2 15.3 183.2 34.8
2: 25.87 183.2 10.34 25.87 53.2 15.3 183.2 34.8
3: 53.20  34.8 10.34 25.87 53.2 15.3 183.2 34.8

IceCreamToucan · Answer 4 · 2019-03-22T18:00:45.260

2

cbind(p1, do.call(cbind, split(p1, 1:nrow(p1))))

#        a     b   1.a  1.b   2.a   2.b  3.a  3.b
# 1: 10.34  15.3 10.34 15.3 25.87 183.2 53.2 34.8
# 2: 25.87 183.2 10.34 15.3 25.87 183.2 53.2 34.8
# 3: 53.20  34.8 10.34 15.3 25.87 183.2 53.2 34.8

edited Mar 22 '19 at 18:00

answered Mar 22 '19 at 16:57

IceCreamToucan

28,083
2
22
38

r extend data.table columns by filling row values

4 Answers4