2

I'm reading a csv file in R that includes a conversion ID column. The issue I'm running into is that my conversionID is being rounded as an exponential number. Below is snapshot of the CSV file (opened in Excel) that I'm reading into R. As you can see, the conversion ID is an exponential format, but the value is: 383305820480.

enter image description here

When I read the data into R, using the following lines, I got the following output. Which looks like it's rounding the string of conversion IDs.

x<-read.csv("./Test2.csv")
options("scipen"=100, "digits"=15)
x

enter image description here

When I export the file as CSV, using the code

write.csv(x,"./Test3.csv")

I get the following output. As you can see, I no longer have a unique identifier as it rounds the number.

enter image description here

I also tried reading the file as a factor, using the code, but I get the same output with numbers rounded. I need the Conversion.ID to be a unique identifier.

x<-read.csv("./Test2.csv", colClasses="character")

The only way I can get the Conversion ID column to stay as a unique identifier is to open the CSV file and write a ' in front of each conversion ID. That is not scalable because I have hundreds of files.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
ekim
  • 113
  • 1
  • 10
  • 1
    how large is your data? can you specify the `colClasses` argument in `read.table` and set that as a character column? – rawr May 12 '14 at 23:19
  • Possible duplicate of http://stackoverflow.com/questions/9397664/force-r-not-to-use-exponential-notation-e-g-e10; http://stackoverflow.com/questions/6706677/how-to-display-numeric-columns-in-an-r-dataframe-without-scientific-notation-e/6707630#6707630 or http://stackoverflow.com/questions/16356530/using-scientific-notation-in-r/16356730#16356730 – Henrik May 12 '14 at 23:21
  • @Henrik, that doesn't work. It rounds the column, which I don't want. – ekim May 13 '14 at 00:34
  • @rawr, I tried reading it as a character column, but no luck. – ekim May 13 '14 at 00:35
  • 1
    @ekim: what does "no luck" mean? Can we have a reproducible example please? – Ben Bolker May 13 '14 at 01:16
  • @Ben, sorry no luck means it didn't work. – ekim May 13 '14 at 01:18
  • 1
    @ekim: "didn't work" is way too vague for us to help you. @rawr's suggestion is the standard way to do this. To repeat, we will need more details (what code did you use? what are the results of applying `str()` to the results? what errors and/or warnings did you get?) Reproducible examples **strongly** preferred. – Ben Bolker May 13 '14 at 01:21
  • @Ben, thanks for the feedback. I wrote some of my code and provided snapshots. Hopefully that clarifies my problem. Sorry for the vagueness, I've been staring at the screen all day. – ekim May 13 '14 at 02:02
  • @BenBolker: When I apply a str(), I get the following: 'data.frame': 10 obs. of 1 variable: $ Conversion.ID: num 3.83e+11 3.96e+13 2.14e+14 1.02e+15 1.22e+15 ... – ekim May 13 '14 at 16:06
  • 1
    Thanks for your feedback everyone. I just figured out the issue. It looks like Excel is converting the data, causing it to lose the digits. If I avoid opening the file in Excel after downloading it, it retains all the digits. I'm not sure if this is a known issue with newer version. I'm using Excel Office Professional Plus 2013. – ekim May 13 '14 at 20:20
  • 1
    Wow. I knew there were issues ( http://dontuseexcel.wordpress.com/2013/02/07/dont-use-excel-for-biological-data/ , http://nsaunders.wordpress.com/2012/10/22/gene-name-errors-and-excel-lessons-not-learned/ , http://datapub.cdlib.org/2014/04/10/abandon-all-hope-ye-who-enter-dates-in-excel/ , http://www.burns-stat.com/documents/tutorials/spreadsheet-addiction/ ) but this is a new one! – Ben Bolker May 13 '14 at 20:29
  • Thanks, @BenBolker for looking into this for me. I was confused because I was troubleshooting/reproducing the problem using Excel OP 2013, which was the root of the problem. Now I know. – ekim May 14 '14 at 06:43

2 Answers2

2

I can't replicate your experience.

(Update: OP reports that the problem is actually with Excel converting/rounding the data on import [!!!])

I created a file on disk with full precision (I don't know the least-significant digits of your data, you didn't show them except for the first element, but I put a non-zero value in the units place for illustration):

 writeLines(c(
 "Conversion ID",
 "    383305820480",
 "  39634500000002",
 " 213905000000002",
 "1016890000000002",
 "1220910000000002"),
 con="Test2.csv")

Read the file and print it with full precision (use check.names=FALSE for perfect "round trip" capability -- not something you want to do on a regular basis):

x <- read.csv("Test2.csv",check.names=FALSE)
options(scipen=100)
print(x,digits=20)
##      Conversion ID
## 1     383305820480
## 2   39634500000002
## 3  213905000000002
## 4 1016890000000002
## 5 1220910000000002

Looks OK.

Now write output (use row.names=FALSE to avoid adding row names/allow a clean round-trip):

write.csv(x,"Test3.csv",row.names=FALSE,quote=FALSE)

The least-mediated way to examine a file on disk from within R is file.show():

file.show("Test3.csv")
## Conversion ID
## 383305820480
## 39634500000002
## 213905000000002
## 1016890000000002
## 1220910000000002

x3 <- read.csv("Test3.csv",check.names=FALSE)
all.equal(x,x3)  ## TRUE

Use system tools to check that the files are the same (except for white space differences -- the original file was right-justified):

system("diff -w Test2.csv Test3.csv")  ## no difference

If you have even longer ID strings you will need to read them as character to avoid loss of precision:

read.csv("Test2.csv",colClasses="character")
##      Conversion.ID
## 1     383305820480
## 2   39634500000002
## 3  213905000000002
## 4 1016890000000002
## 5 1220910000000002

You could probably round-trip through Excel more safely (if you still think that's a good idea) by importing as character and exporting with quotation marks to protect the values.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
0

I just figured out the issue. It looks like my version of Excel is converting the data, causing it to lose the digits. If I avoid opening the file in Excel after downloading it, it retains all the digits. I'm not sure if this is a known issue with newer version. I'm using Excel Office Professional Plus 2013.

ekim
  • 113
  • 1
  • 10
  • Excel only saves the *displayed* precision to CSV. Simply let it show more didgits and you want lose precision. – Roland May 14 '14 at 06:54
  • @Roland The only way that worked is I placed an ' before the numbers in Excel. However I had thousands of files to change, so it was not feasible. You bring up a viable option, but I was thinking that it shouldn't require manual intervention. – ekim May 14 '14 at 07:09