1

I'm starting to learn r using r-studio with data.table, so i'm sorry for asking something this basic. This is what I have (working on a r-markdown):

Object 1:

ps.data <- fread("database.csv") 

I'm trying to create an object that is the same that "ps.data" but removing 5 of the columns (simultaneously) that "database.csv" has, but withput altering "ps.data". So far, i've tried this:

First try: works, but extremely inefficient.

ps.data2<-ps.data[,"col1":=NULL]
ps.data3<-ps.data2[,"col2":=NULL]
...
ps.data6<-ps.data5[,"col5":=NULL]

Then remove all objects that i don't need.

Second try: Even though it creates the object without the columns removed, the problem is that now i open "ps.data" and the code also removed the columns in that one.

ps.data2<- ps.data[, c("col1","col2","col3","col4","col5"):=NULL]
camille
  • 16,432
  • 18
  • 38
  • 60
  • 1
    I can't test at the moment, but does `ps.data[, -c("col1","col2","col3","col4","col5")]` do it? (minus sign at the front) – thelatemail Oct 06 '20 at 01:00
  • I tried it but it keeps giving me ```Error in -c("Col1", "Col2", "Col3", "Col4", "Col5") : Invalid argument for a unitary operator.``` – Nicolás Rojas U Oct 06 '20 at 05:08

3 Answers3

1

EDIT: I had it completely wrong originally. Here's the solution using the data.table::copy command to force data.table to duplicate the data, rather then just reference it.

ps.data2<- copy(ps.data)
ps.data2[, c("col1","col2","col3","col4","col5"):=NULL]

Here's the reason: Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

Basically, when you put ps.data2<-ps.data, data.table is just creating a reference to the original data. It looks like a more complicated discussion about when things are created by reference or actually duplicated, but check out the link above.

Roger-123
  • 2,232
  • 1
  • 13
  • 33
  • Thanks! I tried your solution but it keeps modifying "ps.data" ```ps.data2 <- ps.data``` ```ps.data2[, c("Size","Genres","Current Ver","Android Ver","Last Updated"):=NULL]``` after the second command, ps.data is also modified – Nicolás Rojas U Oct 05 '20 at 20:38
  • I updated my answer. I screwed it up entirely. Check out the update. – Roger-123 Oct 06 '20 at 00:45
  • You were right about the reference between the second object and the orignial data, I asked for help and the solution was, instead of creating the object by deleting the columns, creating it by keeping those needed. ```ps.data2 <- ps.data[,.(col6,col7,col8,col9,col10,col11,col12,col13)]``` – Nicolás Rojas U Oct 06 '20 at 05:15
1

I'm starting to love this, I was thinking on creating the object by eliminating the unneeded columns, in that sense, i haven´t come across a direct solution but by changing the logic of the question and creating the object by keeping the columns needed instead of deleting the others worked perfectly.

ps.data2 <- ps.data[,.(col6,col7,col8,col9,col10,col11,col12,col13)]

EDIT: Forgot to put the explanation. Here it goes. *My teacher said: the symbol := always make reference to the creation (or elimination) of a column, so everytime you use it it will change your initial database. But, with this solution you are not modifying ps.data, instead you are creating an object containing the specified columns (or variables)... Hope it's useful

  • 1
    Data.table can be a little tricky at first, but it gets to be easy and screaming fast when working with large datasets. – Roger-123 Oct 06 '20 at 16:10
0

You can do this using a list and the %in% operator

The %in% operator returns a TRUE/FALSE index where the column names which match our list are TRUE and column names which do not match are FALSE.

To reverse this we use ! which negates this list of giving us an index which is TRUE for names that are not in our list and FALSE for names that are.

You can then use this to subset the data and assign it to ps.data2

Example:

ps.data <- fread("database.csv") 

ps.data2 <- ps.data[, !(names(ps.data) %in% c("col1","col2","col3","col4","col5")]

mtcars example:

df <- mtcars
df <- df[, !(names(df) %in% c("mpg","cyl"))]
jomcgi
  • 96
  • 2