-2

I used the := in R to perform some manipulations in my data set but the usage which I am implementing throws an error.

I tried using other functions like c() for creating subsets but I need something efficient and apparently := should do the job for me. With the subset function, I have a lot of intermediate data frames which are of course unnecessary.

#preprocessing steps for getting rid of the null values rows 
df_data[Quantity<=0,Quantity:=NA]
df_data[UnitPrice<=0,UnitPrice:=NA]
df_data <- na.omit(df_data)

(from the console):

> df_data[Quantity<=0,Quantity:=NA]
Error in `:=`(Quantity, NA) : 
 Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
Maleeha
  • 23
  • 1
  • 3
  • 3
    Maleeha, you need to prove that `df_data` is indeed a `data.table`. @JamesBonkowski's answer shows that the code works fine when it is a `data.table`, and it mimics the error you have when it is *not* a `data.table`. (The fact that your "answer" using `data.table::fread` fixed the problem further suggests that it was not a `data.table`. However you read it in previously, consider doing `setDT(df_data)` before attempting any of this code.) – r2evans Jun 24 '19 at 22:45

2 Answers2

4

:= only works in data.tables

This should work

df_data <- data.table(Quantity = -5:5)
df_data[Quantity<=0,Quantity:=NA]
na.omit(df_data)

This will produce the error

df_data <- data.frame(Quantity = -5:5)
df_data[Quantity<=0,Quantity:=NA]
na.omit(df_data)

That said if you're just filtering out values less than 0 you could do

df_data <- df_data[Quantity > 0 & UnitPrice > 0]
James B
  • 474
  • 2
  • 10
  • Thanks. But I want to get rid of the rows where the value is negative or NA. The code above is giving me a dataset with just one column (Quantity-with no negative values but with NAs) which is something I don't want. Thoughts? – Maleeha Jun 24 '19 at 21:30
  • df_data <- df_data[Quantity > 0 & UnitPrice > 0 & !is.na(Quantity) & !is.na(UnitPrice) ] – James B Jun 24 '19 at 23:22
-3

Fixed the problem now by using fread instead of read.csv while loading the dataset and it works with the := function.

Also, here, posting a useful link for understanding fread and read.csv:

Reason behind speed of fread in data.table package in R

Maleeha
  • 23
  • 1
  • 3