Due to historical memory limitation issues, R reads data as factors. When there is a character-like entry in a column, the whole column is read in as a character vector. Now with RAM more easily available, you can just read in data as string first so that it remains as a character vector rather than factor.
Then use as.numeric
to convert into a real valued number before summing. Strings that cannot be converted into numbers are converted into NA instead. na.rm=TRUE
ignores NAs in the sum.
Taking all of the above:
library(data.table)
#you might want to check out the data.table::fread function to read the data directly as a data.table
x = read.table('C:/Users/user/Desktop/20180911_Dataset_b.csv',encoding = 'UTF-8',sep =',', stringsAsFactors=FALSE)
setDT(x)[, sum(as.numeric(quantity), na.rm=TRUE), by=.(user)]
Reference:
a useful comment from phiver in Is there any good reason for columns to be characters instead of factors?
linking to a blog by Roger Peng:
https://simplystatistics.org/2015/07/24/stringsasfactors-an-unauthorized-biography/