I have a dataframe in R
that I loaded from a CSV file. One of the variables is called "Amount" and is meant to contain positive and negative numbers.
When I looked at the dataframe, this variable's datatype is listed as a factor, and I need it in a numeric format (Not sure which kind though - integer - numeric, umm...?). So, I tried to convert it to one of those two formats but saw some interesting behavior.
Initial dataframe:
str(df)
Amount : Factor w/ 11837 levels "","-1","-10",..: 2 2 1664 4 6290 6290 6290 6290 6290 6290 ...
As I mentioned above, I saw something weird when I tried to convert it to either numeric or integer. To show this, I put together this comparison:
df2 <- data.frame(df$Amount, as.numeric(df$Amount), as.integer(df$Amount))
str(df2)
'data.frame': 2620276 obs. of 3 variables:
$ df.Amount : Factor w/ 11837 levels "","-1","-10",..: 2 2 1664 4 6290 6290 6290 6290 6290 6290 ...
$ as.numeric.df.Amount.: num 2 2 1664 4 6290 ...
$ as.integer.df.Amount.: int 2 2 1664 4 6290 6290 6290 6290 6290 6290 ...
> head(df2, 20)
df.Amount as.numeric.df.Amount. as.integer.df.Amount.
1 -1 2 2
2 -1 2 2
3 -201 1664 1664
4 -100 4 4
5 1 6290 6290
6 1 6290 6290
7 1 6290 6290
8 1 6290 6290
9 1 6290 6290
10 1 6290 6290
11 1 6290 6290
12 1 6290 6290
13 1 6290 6290
14 1 6290 6290
15 1 6290 6290
16 1 6290 6290
17 1 6290 6290
18 2 7520 7520
19 2 7520 7520
20 2 7520 7520
The as.numeric
and as.integer
functions are taking the Amount variable and doing something to it, but I don't know that that is. My goal is to get the Amount variable into some sort of a number data type so I can perform sum/mean/etc on it.
What I am I doing incorrectly that's causing the weird numbers, and what can I do to fix it?