-1

I wrote a Little program to import MSCI World Data (which I can't find on yahoo finance) via OnVista:

library(fImport)
library(fBasiscs)

notation="3193857"
datestart=Sys.Date()-366
interval="Y1"

URL <- composeURL("www.onvista.de/onvista/boxes/historicalquote/export.csv?","notationId=", notation, "&dateStart=", datestart, "&interval=", interval )

data<-read.csv2(URL,header=TRUE,sep=";",dec=",",na.strings=c(""))

My Problem is, that the genearetd table in R has either characters or factors, regardless my commands in the read.csv2 function.

My idea is, that this is because of the imported empty cells in line 254. But even when I command empty cells to NA, this does not work for the whole line and also does not influence the import for the numeric columns. They still apeear as either factors or characters.

Can anybody help me?

arghtype
  • 4,376
  • 11
  • 45
  • 60
Gio84
  • 3
  • 1
  • Hi! Can you edit your message and put an example of what you are getting and what you want to get? This may help us to answer you; there are some functions to transform factors into numeric or whatever, but it is difficult to answer without an specific example. – R18 Apr 12 '17 at 13:18

1 Answers1

0

Your problem is not the missing values but the fact that the numbers contain a 1000s separator. You can either read in the data.frame and convert the relevant columns or you can define a new class definition as suggested in one of the following links:

Here we define a new class that first removes the periods (the 1000 separator) and then converts the comma to period.

setClass("MyNum")
setAs("character", "MyNum", 
       function(from) as.numeric(gsub(",", ".", gsub("\\.", "", from) ) ))
indata <- read.csv2(URL, sep=";", dec=",", 
                    colClasses=c("character", rep("MyNum", 4), "numeric"))

This results in

head(indata)
         Datum Eroeffnung    Hoch    Tief Schluss Volumen
1   11.04.2016    1632.14 1632.14 1632.14 1632.14       0
2   12.04.2016    1644.21 1644.21 1644.21 1644.21       0
3   13.04.2016    1666.16 1666.16 1666.16 1666.16       0
4   14.04.2016    1671.96 1671.96 1671.96 1671.96       0
5   15.04.2016    1670.46 1670.46 1670.46 1670.46       0
6   18.04.2016    1675.32 1675.32 1675.32 1675.32       0

and the classes are

sapply(indata, class)
      Datum  Eroeffnung        Hoch        Tief     Schluss 
"character"   "numeric"   "numeric"   "numeric"   "numeric" 
    Volumen 
  "numeric" 
ekstroem
  • 5,957
  • 3
  • 22
  • 48
  • Well, now I get another error: I did like you suggested and added a: setAs("character", "MyNum", function(from) as.numeric( gsub("\\." , "" ,from) ) ) which now results in the warning message In asMethod(object) : NAs introduced by coercion. I think I'm not cathing the right sign for the separator. But i tried with both, comma and Point.... – Gio84 Apr 14 '17 at 12:41
  • Taking a comma in the gsub removes the comma, but leaves the numbers in the wrong format, because R thinks, the point is the decimal separator while converting them into numeric. Taking the Point in the gsub-function results in NAs in all four columns. – Gio84 Apr 14 '17 at 12:42
  • Well ... that was because I forgot to copy all the lines in ><. Updated – ekstroem Apr 14 '17 at 15:53