0

This question is similar to Expand Data Frame, but my issue is more complex. I have a data set, which I converted to a data table "dt" as in the example mentioned.

> dt
    site year plot     ht count
 1: 0001   00    1 0.0833     3
 2: 0001   00    2 0.2        5
 3: 0001   02    2 0.75       1
 4: 0001   02    2 0.0833     3
 5: 0002   00    1 0.0833     1
----
#Truncated (it's got over 200,000 rows)

> str(dt)
Classes ‘data.table’ and 'data.frame':  220116 obs. of  5 variables:
$ site : chr  "0001" "0001" "0001" "0001" ...
$ year : chr  "00" "00" "02" "02" ...
$ plot : int  1 2 2 2 1 1 2 2 3 ...
$ ht   : num  0.0833 0.0833 0.75 0.0833 0.0833 5.5 0.0833 0.0833 0.0833 
0.0833 ...
$ count: num  3 15 1 3 1 3 1 1 1 2 ...
- attr(*, ".internal.selfref")=<externalptr> 
> 

I'd like to expand the data set so that every height has its own row. The number of rows to be determined by the count column. It should look something like this:

 site year plot     ht
 0001   00    1 0.0833
 0001   00    1 0.0833
 0001   00    1 0.0833
 0001   00    2 0.2
 0001   00    2 0.2
 0001   00    2 0.2
 0001   00    2 0.2
 0001   00    2 0.2
 0001   02    2 0.75 
 0001   02    2 0.0833
 ----

I've attempted using something similar to the function in the Expand Data Frame example:

f<-function(x,y,len=max(y)) {res<-numeric(len);res[y]<-x;res}
dt_expd<-dt[,list(ht=f(ht,count,count)),by=c(site,year,plot)]

And I get this error:

Error in eval(expr, envir, enclos) : object 'site' not found

The challenges are:

  1. Expanding rows with 'ht' that retain the correct site, year, and plot #

  2. Old university computer

This is a small part for my graduate thesis. Any help is greatly appreciated!

-Lake Graboski

  • 1
    `dt[rep(seq_len(nrow(dt)),count)]`. – nicola Jan 05 '18 at 16:59
  • What a simple answer! You probably don't want to know how long I've been working at this. Thank you! – Lake Graboski Jan 05 '18 at 17:07
  • @h3rm4n It seems so, now that I understand a bit more about this rep-function-nested-within-the-index method. If one goes to this question, scrolls down to Max Ghenis's example he uses `dt[rep(seq(.N), freq), !"freq", with=F]` . I personally, am very new to this, did read that post before asking this question, and still didn't understand. That's why I asked this question. Now that I know (kind of) what's going on, should I delete this question since it's so similar to that other one? If so, how do I do that? – Lake Graboski Jan 05 '18 at 19:52

1 Answers1

1

There are a couple of ways to do that using data.table. An alternative to the comments is by using .I (a vector of the number of rows - 1:nrow(df)) and .SD which is the data.table itself:

dt[, .SD[rep(.I, count)]]
#    site year plot     ht count
# 1:    1    0    1 0.0833     3
# 2:    1    0    1 0.0833     3
# 3:    1    0    1 0.0833     3
# 4:    1    0    2 0.2000     5
# 5:    1    0    2 0.2000     5
# 6:    1    0    2 0.2000     5
# 7:    1    0    2 0.2000     5
# 8:    1    0    2 0.2000     5
# 9:    1    2    2 0.7500     1
#10:    1    2    2 0.0833     3
#11:    1    2    2 0.0833     3
#12:    1    2    2 0.0833     3
#13:    2    0    1 0.0833     1
LyzandeR
  • 37,047
  • 12
  • 77
  • 87