4

I am reading a csv file into R and trying to do take the log of the data. The csv file has columns of data with the first row having text headers and the rest numeric data.

data<-read.csv("rawdata.csv",header=T)
trans<-log(csv2)

I get the following error when I do this:

Error in Math.data.frame(list(Revenue = c(18766L, 20197L, 20777L, 23410L, : non-numeric variable in data frame: Costs

Output of str should have been inserted in Q-body:

data.frame': 167 obs. of 3 variables: 
 $ X: int 18766 20197 20777 23410 23434 22100 22337 21511 22683 23151 ... 
 $ Y: Factor w/ 163 levels "1,452.70","1,469.00",..: 22 9 55 109 158 82 131 112 119 137 ...
 $ Z: num 564 608 636 790 843 ...

How do I correct this?

IRTFM
  • 258,963
  • 21
  • 364
  • 487
J M
  • 369
  • 2
  • 4
  • 10
  • Could you show the output of `str(data)`? – Iterator Aug 06 '11 at 00:44
  • 'data.frame': 167 obs. of 3 variables: $ X: int 18766 20197 20777 23410 23434 22100 22337 21511 22683 23151 ... $ Y: Factor w/ 163 levels "1,452.70","1,469.00",..: 22 9 55 109 158 82 131 112 119 137 ... $ Z: num 564 608 636 790 843 ... – J M Aug 06 '11 at 05:28
  • 1
    It's more convenient if you edit your question than to post in comments. – Roman Luštrik Aug 06 '11 at 06:32
  • Tada! `Y` is a factor - big problem. The commas shouldn't be in there. – Iterator Aug 06 '11 at 12:30

4 Answers4

2

EDIT: removed speculation about structure given that it has now been offered.

Dataframes are lists, so lapply will loop over them columns and return the math function done on them.

If the column is a factor (and here str(Costs) would tell you) then you could do the possibly inefficient approach of converting all columns as if they were factors:

Costs_logged <- lapply(Costs, function(x) log(as.numeric(as.character(x))) )
Costs_logged

(See the FAQ about factor conversion to numeric.)

EDIT2: If you want to convert the factor variable with commas in the labels use this method:

data$Y <- as. numeric( gsub("\\,", "", as.character(data$Y)  ) )

The earlier version of this only had a single-backslash, but since both regex and R use backslashes as escape characters, "special regex characters" (see ?regex for listing) need to be doubly escaped.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I think this is on the right track but I get the following error though: Warning messages: 1: In FUN(X[[40L]], ...) : NAs introduced by coercion 2: In FUN(X[[40L]], ...) : NAs introduced by coercion 3: In FUN(X[[40L]], ...) : NAs introduced by coercion I am surprised this is this difficult. – J M Aug 06 '11 at 05:48
  • @J M Which "this" were you using? The difficulty was in not posting the results of str(head(Costs)). The second strategy _should_ have worked with a factor structure. – IRTFM Aug 06 '11 at 13:35
  • @J M Will add a fix for the commas in your data. – IRTFM Aug 06 '11 at 13:43
  • 1
    Thanks to @GiladGreen for correcting an error in my earlier code. – IRTFM Feb 21 '19 at 16:43
2

Tada! Y is a factor - big problem. The commas shouldn't be in there.

Also, your original question has some anomalies: data is the loaded data.frame, yet the transformation is applied to csv2. Did you rename the columns? If so, you've not given a full summary of the steps involved. Anyway, the issue is that you have commas in your second column.

Iterator
  • 20,250
  • 12
  • 75
  • 111
1

Can you give use the first few values for the variable that is giving you trouble? If the "Costs" variable is giving you trouble (what it looks like from your example), execute something like this:

data <- read.csv("rawdata.csv",header=T)
data[c(1:5),"Costs"]

It sounds as though you have a column of values in the csv file -- column Y -- that has commas in the numbers. That is, it sounds like your csv file looks like this:

X,Y,Z
"18766","1,452.70","564"
"20197","1,469.00","608"

or X,Y,Z 18766,"1,452.70",564 20197,"1,469.00",608

or something similar. If this is the case, the problem is that column Y can't be read easily by R with a comma in it (even though it makes it easier for us humans to read). You need to get rid of those commas; that is, make your data file look like this:

X,Y,Z
18766,1452.70,564
20197,1469.00,608

(you can leave the quotes in -- just get rid of the commas in the numbers themselves).

There are a number of ways to do this. If you exported your data from excel, format that column differently. Or, alternatively, open the csv in excel, save it as a tab-delimited file, open the file in your favorite text editor, and find-and-delete the commas ("find and replace with nothing").

Then try to pull it back into R with your original command.

CompEcon
  • 1,994
  • 1
  • 14
  • 12
0

Clearly the columns are not all numeric, so just ensure that they are. You can do this by forcing the class of every column when read in:

data <- read.csv("rawdata.csv", colClasses = "numeric")

(read.csv is just a wrapper on read.table, and header = TRUE by default)

That will ensure all columns are of class numeric if that is in fact possible.

If they really are not numeric columns, exclude the ones you don't want to transform, or just work on the columns individually:

x <- data.frame(x = 1:10, y = runif(1, 2, 10), z = letters[1:10])

colClasses can be used to ignore columns by specifying "NULL" if that makes things simpler.

These are equivalent since "x" and "y" are the first 2 columns:

log(x[ , 1:2])


log(x[ , c("x", "y")])

Individually:

log(x$x)

log(x$y)

It's always important to check assumptions about the data read from external sources. Basic checks like summary(x), head(x) and str(x) will show you what the data actually are.

mdsumner
  • 29,099
  • 6
  • 83
  • 91
  • I tried data <- read.csv("rawdata.csv", colClasses = "numeric") and I got the following error Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '"1' – J M Aug 06 '11 at 05:47
  • 1
    Then it's certainly not possible that all the columns in the file are numeric. Exclude columns with "NULL" or just subset like I said. I think you should just work on the input basics and get familiar with ?read.table, ?summary, ?Extract and read the R Import/Export manual – mdsumner Aug 06 '11 at 05:58