1

Basic data was generated using a SQL query and the intention is to process data in R. However, while importing from a .csv or from .xlsx, R imports numbers as characters in spite of changing the data-type in the built-in import tool. Further, while performing basic arithmetic operations, following errors were encountered: In Ops.factor((data$A), (data$B)) :‘/’ not meaningful for factors

Is there a simple way to solve this?

  • Data-set was analysed using the str() function, which revealed that R imported the particular columns as factors.
  • Used package varhandle and function unfactor to unfactorize the data
  • Used as.numeric for some columns which were read as characters instead of factors
  • Tried changing data-types in Excel before importing

    data$A <- unfactor(data$A)

    data$B <- unfactor(data$B)

    data$PERCENTAGE <- (data$B)/(data$A)*100

By what means can R import the data as per specified data-types?

Thank you for the help in advance!

marine8115
  • 588
  • 3
  • 22
  • Can you provide a sample of your data (`dput(data)`)? – patL Feb 07 '19 at 11:17
  • Probably you are using `read.csv` without specify `stringsAsFactors=FALSE` – A. Suliman Feb 07 '19 at 11:18
  • It will help to provide [some example data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Almost certainly, a column that you thought was numeric contains characters, which has caused conversion to factors because `stringsAsFactors = FALSE` was not specified. – neilfws Feb 07 '19 at 11:18
  • I tried with `stringsAsFactors = F` , but got the following error `non-numeric argument to binary operator` – marine8115 Feb 07 '19 at 11:22
  • Again, will help to see the command used to read in the data. – neilfws Feb 07 '19 at 11:25
  • Data1 <- read.csv("Z:/Data1.csv", stringsAsFactors=FALSE). Will `fread` from `data.table` make a difference? – marine8115 Feb 07 '19 at 11:26
  • Provide example data. Maybe set `sep=","` for `read.csv`? – zx8754 Feb 07 '19 at 11:29
  • Example data is a bit difficult to provide....I understand it will help, but some confidentiality clauses are restrictive – marine8115 Feb 07 '19 at 11:30
  • Does the file have a header? If so and you don't include `header = TRUE`, that will introduce characters into the columns on the first row. – neilfws Feb 07 '19 at 11:36
  • `data2 <- read.csv("Z:/Data1.csv", header = TRUE, sep = ",",as.is = !stringsAsFactors, colClasses = NA, na.string = "NA", skip = 0, strip.white = TRUE, fill = TRUE, comment.char = "#", stringsAsFactors = FALSE )` I used this code just now, but to no avail. – marine8115 Feb 07 '19 at 11:41

1 Answers1

2

For csv files I would recommend read_csv from Hadley Wickham's excellent Tidyverse package. It has intelligent defaults that cope with most things I throw at it.

For .xlsx, there is read_excel, also from the Tidyverse package (there are other packages available). Or, alternatively just export a .csv from within Excel and use read_csv.

[Note the Tidyverse's will import these files as a "tibble" which is essentially a data frame on steroids without some of the headaches but is easily converted to a data.frame if you prefer.]

indubitably
  • 297
  • 2
  • 7