2

I'm trying to delete a row of my data table based on the values of two columns, but with no luck.

I tried two codes that I found in other threads:

my.data.table[!(my.data.table[,1]==6557 & my.data.table[,2]=="31-Dec-82"),] 
  ###var1 is in column 1 and var2 is in column 2

my.data.table %>% filter(var1!= 6557 & var2!="31-Dec-82")

but none works. Note that var1 is numeric and var2 is character (not date for now). The only approach I could make work is by looking up the row number manually

my.data.table<-my.data.table[-rownumber] 

but this is not very convenient in a 1M row table, even if sorted.

Any idea why I cannot make it work?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Valentina Ruts
  • 121
  • 1
  • 8
  • You're terminology is confusing. Do you have a dataframe or a data.table? If the latter, you have to subset differently than with base R:`my.data.table <- my.data.table[var1 != 6557]` – David Klotz Apr 26 '18 at 03:05
  • `DT[!.(6557, "31-Dec-82"), on=names(DT)[1:2]]` is the data.table syntax for an anti join. Or `df %>% anti_join(data_frame(var1 = 6557, var2 = "31-Dec-82"))` if going the dplyr route from data like @MauritsEvers' – Frank Apr 26 '18 at 14:53

2 Answers2

1

Provided I understood you correctly, you want to remove rows where both var1 == 6447 and var2 == "31-Dec-82". This corresponds to negating the logical expression var1 == 6557 & var2 == "31-Dec-82".

Method using dplyr::filter

# Sample data
df <- data.frame(
    var1 = 6556:6558,
    var2 = c("31-Dec-82", "31-Dec-82", "30-Dec-82")
)

df %>% filter(!(var1 == 6557 & var2 == "31-Dec-82"))
#  var1      var2
#1 6556 31-Dec-82
#2 6558 30-Dec-82

Method using base R's subset

subset(df, !(var1 == 6557 & var2 == "31-Dec-82"))
#  var1      var2
#1 6556 31-Dec-82
#3 6558 30-Dec-82
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

As it is a data.table,

my.data.table[,1]

won't subset the column as it in data.frame. The column values can be extracted as a vector with [[

my.data.table[[1]]

i.e.

my.data.table[!(my.data.table[[1]]==6557 & my.data.table[[2]] =="31-Dec-82"),] 

Or specify the names of the column

my.data.table[!(var1==6557 & var2 =="31-Dec-82")] 

Another option to subset the data.table columns would be specify with = FALSE

my.data.table[, 1, with = FALSE]

but this will return a single column data.table instead of a vector

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you, this works. Sometimes I still get confused as to what I can do with a data frame and a data table (obv I'm new to R) – Valentina Ruts Apr 26 '18 at 19:18