-1

I am working with CSV file format data in R. There are 315 rows in a column called itemsDispensed. I want to calculate the total sum of these numbers.

I have tried to do sum(as.numeric(as.character(....))) in R however the result I get is different from the sum that I have got while doing in excel.

The code below shows the first 20 rows:

 head(select2014Chap6Sec1[ ,4], n = 20)
 [1] 11.615  0.001   0.023   0.026   56.101  7.127   8.572   0.004   0.001    45.98   225.525 0.526  
 [13] 119.999 0.004   0.522   4.781   31.473  0.001   2.338   0.712  
 6999 Levels: 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0.011      0.012 0.013 ... 999.958

The method I am using shows below:

> sum(select2014Chap6Sec1[ ,4])
  [1] 778211

The error for sum(as.numeric(as.character(....))) shows below with NA value:

> sum(as.numeric(as.character(testFactorCol4)))
[1] NA
Warning message:
NAs introduced by coercion 

I can provide all the data if you want using dput method. Thank you

Machavity
  • 30,841
  • 27
  • 92
  • 100
Sandesh Rana
  • 81
  • 4
  • 13
  • That's not an error, it's a warning. Try `sum(..., na.rm = TRUE)`. See `?sum` But you should really use `x <- select2014Chap6Sec1[4]; sum(as.numeric(levels(x))[x], na.rm = TRUE)` to convert numeric factors to their real numeric values – Rich Scriven Aug 13 '15 at 17:21
  • @RichardScriven, I am sorry to tell that it is not giving me the correct answer and it is still giving me the warning. I tried, x <- select2014Chap6Sec1[ , 4]; sum(as.numeric(levels(x))[x], na.rm = TRUE) x. The variable x has value Factor w/ 6999 levels '0'... – Sandesh Rana Aug 13 '15 at 19:09
  • try `as.numeric(as.factor(x))` inside your function `sum`. – SabDeM Aug 13 '15 at 20:16
  • How did you get 6999 levels with only 350 rows? For the long run, it might make sense to try to figure out why the column is being read as characters/factors in the first place. For example, if it was because of missing value encoding then you could likely fix this when reading the dataset in. – aosmith Aug 13 '15 at 20:47
  • @SabDem, I have tried that but it didn't work. I have tried with sum(as.numeric(as.factor(x)), na.rm = TRUE), sum(as.numeric(levels(x)[x]), sum(as.numeric(as.character(x)) and sum(as.numeric(levels(x))[x], na.rm = TRUE) but no hope. – Sandesh Rana Aug 13 '15 at 21:02
  • @aosmith, I will check the data. – Sandesh Rana Aug 13 '15 at 21:03
  • @aosmith, I have checked the data where there are not any null values. However there are some values such as 0.001 or 0.002, does that affect? – Sandesh Rana Aug 13 '15 at 21:24
  • 1
    I've found that if R reads something numeric as a factor then it means I've put a character somewhere in my numeric column and I like to figure out where the problem is. That being said, I can't guess why `sum(as.numeric(as.character(x)), na.rm = TRUE)` isn't working for you. – aosmith Aug 13 '15 at 21:37
  • @aosmith, I found an error after the code was implemented. The new data table has some NA values. After analysing data visually, I found out that all the numbers greater than 999 such as 1,035.74, 5,650.09 are changed as NA values. Do you have any idea on how to keep those numbers. – Sandesh Rana Aug 13 '15 at 22:06
  • See [this question/answers](http://stackoverflow.com/questions/13594223/number-values-include-comma-how-do-i-make-these-numeric). – aosmith Aug 13 '15 at 22:34
  • 1
    @aosmith, The code sum(as.numeric(gsub(",", "", x))) works for me so I do not have to use as.character code. Thank you very much Aosmith. – Sandesh Rana Aug 13 '15 at 22:47

1 Answers1

2

When you do read.csv, try setting the argument stringsAsFactors=FALSE. Or, you can make use of the argument colClasses, wherein you would do something like read.csv("file.csv", colClasses=c("character", "integer","numeric")) if the first column was a character, second an integer, and third column numeric.

Specifically, it sounds like you have a column that R is guessing is a "factor", so you should set that column to be "numeric" in colClasses.

That may save you some trouble converting things once they've been read into R.

rbatt
  • 4,677
  • 4
  • 23
  • 41