barplot failure in R 3.1.0. read.csv converting what should be numerics to factors

Question

I have a little problem with the bar plot function of R 3.1.0. (it works fine in older versions).

nd_p_a<- read.csv("nd_p_a.csv")
barplot(nd_p_a$y, col="blue", names.arg=nd_p_a$x, xlab="k", ylab="P(k)")

has worked without any warnings or errors. But i version 3.1.0 i got an error:

Error in barplot.default(nd_p_a$y, col = "blue", names.arg = nd_p_a2$x,  : 
  'height' must be a vector or a matrix

So, why did this do not work in this version? And how can i convert a factor to a vector? I tried as.numeric() and so on, but with no proper result.

The CSV File contains data like this:

"x","y"
1.0,48.947791826110596
2.0,6.317211620667564
3.0,14.982593438237588
4.0,3.4443873302013475
5.0,9.760934831763135
6.0,1.7191829918211519
7.0,3.9200958456693455
8.0,1.0765813450714172
9.0,2.290369697396343
10.0,0.6342337460169456
11.0,1.1210994624619959
12.0,0.5291701034830391

As wished more informations:

sessionInfo()

3.0.3

R version 3.0.3 (2014-03-06)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

3.1.0

R version 3.1.0 beta (2014-03-28 r65330)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.1.0

str(nd_p_a)

3.0.3

'data.frame':   1449 obs. of  2 variables:
 $ x: num  1 2 3 4 5 6 7 8 9 10 ...
 $ y: num  48.95 6.32 14.98 3.44 9.76 ...

3.1.0

'data.frame':   1449 obs. of  2 variables:
 $ x: num  1 2 3 4 5 6 7 8 9 10 ...
 $ y: Factor w/ 221 levels "0.0010183159621912567",..: 194 201 171 184 220 173 187 167 178 166 ...

Here are a few tips on how to make a good small, reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Roman Luštrik, Apr 09 '14 at 12:59
structure(list(x = c(1, 2, 3, 4, 5, 6), y = structure(c(194L, 201L, 171L, 184L, 220L, 173L), .Label = c("0.0010183159621912567", "0.0010404532657171534",..//..,"9.961786586653596E-4"), class = "factor")), .Names = c("x", "y"), row.names = c(NA, 6L), class = "data.frame") — Thargor, Apr 09 '14 at 13:01
(I meant to update your post, but this works, too.) Your `y` is a `factor`. Do `nd_p_a$y <- as.numeric(as.character(nd_p_a$y))`, then `barplot()` should work. Read *An Introduction to R* on factors. — Stephan Kolassa, Apr 09 '14 at 13:03
Thx@ Stephan Kolassa. That works. But why did this work in R 3.0 and not in R 3.1? — Thargor, Apr 09 '14 at 13:05
I am having a lot of trouble (as I suspect are the other experienced users) believing that this really stopped working in 3.1.0; `barplot` is a **very** frequently used function, and very stable, so we're all guessing that you made some other mistake in reading in or processing your data. Can you please (1) make sure you are starting from a clean R session, (2) post the results of `sessionInfo()` and `str(nd_p_a)` from both 3.0.3 and 3.1.0 on your system? — Ben Bolker, Apr 09 '14 at 13:12
Added to my post. I used in R3.1.0 and R3.0.3 the same function [read.csv("nd_p_a.csv") ] to import the csv. With str(nd_p_a) it shows, that it results in different datastructures. That might be the "problem" — Thargor, Apr 09 '14 at 13:19
Curious why the locale information is slightly different. These are on the same machine, right? You definitely used just `read.csv("nd_p_a.csv")` in both cases? — Ben Bolker, Apr 09 '14 at 13:23
Yes, the same machine. I started R type sessionInfo(), than i read the csv file and thani make str(nd_p_a). With both versions. — Thargor, Apr 09 '14 at 13:25

Tarantoga · Answer 1 · 2014-04-23T15:00:23.623

There seems to be an issue with the new version (3.1.0) of type.convert() which gets called by read.table() which gets called by read.csv() in R. The most recent version of type.convert() assumes that the representation in your file is more accurate than R's internal numeric storage format (double-precision floating point values) and thus casts it to a FACTOR. This behavior appears very surprising to a bunch of people, so I would bet it will go away, or there will be a parameter that will be able to be passed to type.convert() through the chain. It seems sufficiently painful for people (including myself) that rely on the old-standing behavior of the automatic field type detection algorithm.

This question should be cross-linked somewhere upstream to something like "Why doesn't read.csv() work reliably with floating point values anymore?"

http://r.789695.n4.nabble.com/type-convert-and-doubles-td4688616.html

It is super annoying. https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15751 I wrote it up as a bug and they said it was intended behavoir — Andrew Cassidy, Apr 23 '14 at 15:20

score 1 · Accepted Answer · answered Apr 23 '14 at 15:24

1

Here is a work around. The new behavior is annoying

read.csv("nd_p_a.csv", colClasses=c("numeric", "numeric"))

answered Apr 23 '14 at 15:24

Andrew Cassidy

2,940
1
22
46

barplot failure in R 3.1.0. read.csv converting what should be numerics to factors

2 Answers2

Linked

Related