Have the below dataframe where all the columns are factors which I want to use them as numeric columns. I tried different ways but it is changing to different values when I try as.numeric(as.character(.))
The data comes in a semicolon separated format. A subset of data to reproduce the problem is:
rawData <- "Date;Time;Global_active_power;Global_reactive_power;Voltage;Global_intensity;Sub_metering_1;Sub_metering_2;Sub_metering_3
21/12/2006;11:23:00;?;?;?;?;?;?;
21/12/2006;11:24:00;?;?;?;?;?;?;
16/12/2006;17:24:00;4.216;0.418;234.840;18.400;0.000;1.000;17.000
16/12/2006;17:25:00;5.360;0.436;233.630;23.000;0.000;1.000;16.000
16/12/2006;17:26:00;5.374;0.498;233.290;23.000;0.000;2.000;17.000
16/12/2006;17:27:00;5.388;0.502;233.740;23.000;0.000;1.000;17.000
16/12/2006;17:28:00;3.666;0.528;235.680;15.800;0.000;1.000;17.000
16/12/2006;17:29:00;3.520;0.522;235.020;15.000;0.000;2.000;17.000
16/12/2006;17:30:00;3.702;0.520;235.090;15.800;0.000;1.000;17.000
16/12/2006;17:31:00;3.700;0.520;235.220;15.800;0.000;1.000;17.000
16/12/2006;17:32:00;3.668;0.510;233.990;15.800;0.000;1.000;17.000
"
hpc <- read.csv(text=rawData,sep=";")
str(hpc)
When run against the full data file after dropping the date and time variables, the output from str()
looks like:
> str(hpc)
'data.frame': 2075259 obs. of 7 variables:
$ Global_active_power : Factor w/ 4187 levels "?","0.076","0.078",..: 2082 2654 2661 2668 1807 1734 1825 1824 1808 1805 ...
$ Global_reactive_power: Factor w/ 533 levels "?","0.000","0.046",..: 189 198 229 231 244 241 240 240 235 235 ...
$ Voltage : Factor w/ 2838 levels "?","223.200",..: 992 871 837 882 1076 1010 1017 1030 907 894 ...
$ Global_intensity : Factor w/ 222 levels "?","0.200","0.400",..: 53 81 81 81 40 36 40 40 40 40 ...
$ Sub_metering_1 : Factor w/ 89 levels "?","0.000","1.000",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Sub_metering_2 : Factor w/ 82 levels "?","0.000","1.000",..: 3 3 14 3 3 14 3 3 3 14 ...
$ Sub_metering_3 : num 17 16 17 17 17 17 17 17 17 16 ...
Can anyone help me in getting the expected output?
expected output:
> str(hpc)
'data.frame': 2075259 obs. of 7 variables:
$ Global_active_power : num "?","0.076","0.078",..: 2082 2654 2661 2668 1807 1734 1825 1824 1808 1805 ...
$ Global_reactive_power: num "?","0.000","0.046",..: 189 198 229 231 244 241 240 240 235 235 ...
$ Voltage : num "?","223.200",..: 992 871 837 882 1076 1010 1017 1030 907 894 ...
$ Global_intensity : num "?","0.200","0.400",..: 53 81 81 81 40 36 40 40 40 40 ...
$ Sub_metering_1 : num "?","0.000","1.000",..: 2 2 2 2 2 2 2 2 2 2 ...
$ Sub_metering_2 : num "?","0.000","1.000",..: 3 3 14 3 3 14 3 3 3 14 ...
$ Sub_metering_3 : num 17 16 17 17 17 17 17 17 17 16 ...