UPDATE
After consulting the data.table FAQs, specifically this part
1.2 Why does DT[,"region"] return a 1-column data.table rather than a vector?
See the answer above. Try DT$region instead. Or DT[["region"]].
1.3 Why does DT[, region] return a vector for the “region” column? I’d like a 1-column data.table.
Try DT[ , .(region)] instead. .() is an alias for list() and ensures a data.table is returned.
I realized that there was an even simpler solution.
In order to use cbind that preserves the column names, you need to pass two data tables. The issue with the column getting named V4
is because you're passing a vector to cbind.
But you can control whether data.table returns a vector or a 1-column data.table. Here's what that looks like in your case:
newest_gapminder2 <- cbind(gapminder[, country:year], gapminder[, 'pop'])
or
newest_gapminder3 <- cbind(gapminder[, country:year], gapminder[, .(pop)])
ORIGINAL RESPONSE
I found your question because I had the same one! I wanted to create a subset of a data table without needing to list every single column.
I tried a few different things and found something that I can live with...
## create a data table for this example
dt <- data.table("col1"=1:5, "col2"=2:6, "col3"=letters[2:6], "col4"=letters[1:5], "col5"=3:7)
dim(dt)
dt
## the goal is to create a subset of this data frame that contains col1, col3, col4, and col5
Sorry I didn't use your data. This should work the same though.
# method 1
## subset out a vector and give the column name
col1 <- dt[, col1]
## use cbind on the object and the data table subset
## the object name takes the place of the column name in the table
new_dt <- cbind(col1, dt[, col3:col5])
## check that the result is a data.table
class(new_dt)
dim(new_dt)
new_dt
That works out OK but it feels a little hack-y. The issue I was running into trying something like this:
dt_alt <- cbind(dt[, col1], dt[, col3:col5])
but dt[ , col1]
creates a vector not a data table, and when it gets coerced in cbind, the resulting name is V1
. So I thought, maybe it'd be easier to avoid binding single columns, and then just drop the unneeded columns after the fact.
# method 2
## take two different subsets/slices and cbind them
new_dt2 <- cbind(dt[, col1:col2], dt[, col3:col5])
## take out col2
new_dt2[, col2 := NULL]
class(new_dt2)
dim(new_dt2)
new_dt2
That was a little bit better but then I was wondering about something even more streamlined. I thought about chaining in data.table and wanted to combine it with method 2. I want to credit this post for the := NULL
technique.
# method 3
## thinking about how data.table works, can the := NULL be chained?
## spoiler: it can!
## this feels like kind of a hack but...
new_dt3 <-cbind(dt[,col1:col2][, col2:=NULL], dt[,col3:col5])
class(new_dt3)
dim(new_dt3)
new_dt3
OK, after all that, I felt badly that I didn't use the gapminder data from your question, so here is my method #3 applied to your data:
gapminder <- cbind(gapminder[, country:year]), gapminder[, pop:gdpPercap][, gdpPercap := NULL])
I timed it using the technique described here.
user system elapsed
0.00 0.00 0.02
All three techniques were comparable. I'm not sure how this would perform on a multi-GB dataset, though.