0

I am using statistic software for the first time and am struggling with R. I have collected data and transformed them into a csv (which for some reasons seperates with ";"instead of ";") and imported in into R, which works fine. If I use the str function and look at the summary however I have factor variables as well as int-variables but not clue how to change it,since I formatted all the columns in excel before and they all say numeric. I am trying to do a multiple regression for my thesis but can not even get to uploading the data properly so I would appreciate any help.

Furthermore,does anyone know how many explanatory variables I can include in R?

Thanks in advance.

Luisa
  • 1
  • 1
    Luisa Welcome to SO. Pleas read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to ask better question next time. – agstudy Jul 05 '13 at 14:05
  • You could use `read.csv( "myData.csv", stringsAsFactors = FALSE )` to preserve the format that Excel has conrted your data into. Why your format everything as numbers but Excel still saves as character, you will need to find out by looking at the data in the csv file. – vaettchen Jul 05 '13 at 15:02
  • If missing observations are recorded with, for example, a '.' in your Excel data file try including na.strings = "." in your read.csv statement. – Mark Miller Jul 05 '13 at 17:52

2 Answers2

5

Many options, to convert your value to numerics like using as.numeric but the better is to use colClasses option in read.csv. This ensures to read your data in the right format.

For example:

 read.csv(filename, sep=';',
                     colClasses=c("character",      ## first column is a character
                                  rep("numeric",4)) ## followed by 4 numeric varaibles.

You can also in conjonction of this , use argument stringsAsFactors=FALSE if you have more than 5 variables and you don't want to convert string to factors.

More explanations can be found in ??read.csv or more generally ??read.table

For you ambiguous question about "how many explanatory variables I can include in R?" , I formulate that as how many columns/variables can I read. The only limit to read or to create matrix/data.frame is your RAM.

agstudy
  • 119,832
  • 17
  • 199
  • 261
0

If read.csv is importing some of your supposed numeric variables as factors or strings, it's quite likely you have some values in those columns that are NULL, NA or some other non-numeric value. Check the levels or values for non-numeric entries and either eliminate/handle them in Excel or R itself. Once things are pure numeric it should read in fine, or you can post-process with as.integer() or as.numeric().

Tommy Levi
  • 771
  • 5
  • 12
  • Thank you. I do have a fair few dummies in my data, I guess I have to acknowledge that somehow in R? The dummies come up as "int" at the moment, but the other ones that are purely numeric such as "income" show up as factors...I´ll try as.numeric. – Luisa Jul 05 '13 at 15:52
  • Luisa how many variable do you have ? why not t explicitly define colClasses? – agstudy Jul 05 '13 at 15:53
  • Got 42 variables (I know thats a lot but my professor was sure it would work). I have to read up on colClasses,have never heard of it. – Luisa Jul 05 '13 at 16:13