0

This is a follow up to a question I posted yesterday. I can't seem to get things right about floating point comparison in R. Yesterday I was using >= to compare two floating point values but that dint seem to get the right results.

Today, I tried to run all.equal, element-wise, on two vectors, which yielded a mean difference, which does not work for this application. I need the comparison function to return a vector. Then, I found identical and combined it with mapply. This became more accurate, but not 100% accurate. What am I doing wrong? Since this is financial data, should I be using a decimal data type? If so, how?

From yesterday's post (with updated code, reflecting current frustration):

The goal is to: read the data into a data.frame, take an average of yesterday's High, Low, and Close prices; and, compare today's Open price with yesterday's average.

After running the scripts on large data, I found that my results in R didn't match a similar analysis run in Excel. I've scaled down the problem to it's essential parts. My test file, test.csv, looks like this, including a new line at the end of the last row:

<TICKER>,<DATE>,<TIME>,<OPEN>,<LOW>,<HIGH>,<CLOSE>
EURUSD,20020311,0:00:00,0.8733,0.873,0.877,0.8749
EURUSD,20020312,0:00:00,0.8749,0.8704,0.876,0.8754
EURUSD,20020313,0:00:00,0.8753,0.8725,0.878,0.8754
EURUSD,20020314,0:00:00,0.8753,0.8752,0.8841,0.8823
EURUSD,20020315,0:00:00,0.8823,0.8808,0.8868,0.8823
EURUSD,20020318,0:00:00,0.8809,0.878,0.8828,0.8821
EURUSD,20020319,0:00:00,0.8821,0.8796,0.884,0.8816
EURUSD,20020320,0:00:00,0.8815,0.8786,0.8857,0.8855
EURUSD,20020321,0:00:00,0.8854,0.8806,0.8857,0.8823

My Code:

# Read in test file
raw <- read.csv('test.csv', header=TRUE, sep=",")

# Convert date and dump dat into data frame
stripday <- strptime(raw$X.DATE, format="%Y%m%d")
data <- data.frame(stripday, raw)

# Drop unused data columns and name used columns
drops <- c("X.DATE.", "X.TIME.", "X.TICKER.")
data <- data[, !(names(data) %in% drops)]
colnames(data) <- c("Date", "Open", "Low", "High", "Close")

# Convert values from facotrs to numeric
data[,2] <- as.numeric(as.character(data[,2]))
data[,3] <- as.numeric(as.character(data[,3]))
data[,4] <- as.numeric(as.character(data[,4]))
data[,5] <- as.numeric(as.character(data[,5]))

# Take average of High, Low, and Close 
data[['Avg']] <- NA
data[['Avg']][2:9] <- (
    data[['High']][1:8] + 
    data[['Low']][1:8] + 
    data[['Close']][1:8]) / 3

# Is Open greater than or equal to Average
data[['OpenGreaterThanOrEqualAvg']] <- NA
data[['OpenGreaterThanOrEqualAvg']][2:9] <- 1 * (mapply(identical,data[['Open']][2:9], data[['Avg']][2:9]) | data[['Open']][2:9] > data[['Avg']][2:9])

# Write data to .csv
write.table(data, 'output.csv', quote=FALSE, sep=",", row.names=FALSE)

Note that there should be a 1, not 0, for 3/14/2002.

Community
  • 1
  • 1
Brian
  • 1,729
  • 2
  • 14
  • 17
  • This has the same answer as your prior question. It's a result of floating point arithmetic. And if you weren't confused enough, there's a 1 on 2012-03-14 when I run your code. – Joshua Ulrich Mar 18 '13 at 20:25
  • Just to clarify: `all.equal` will result in TRUE if the comparison succeeds. Else, it'll return *not* FALSE, but the *mean absolute difference* **or** *mean relative difference*. [**See here for more info on `all.equal`**](http://stackoverflow.com/questions/15334701/how-does-the-tolerance-parameter-of-all-equal-work/15334737#15334737) – Arun Mar 18 '13 at 20:33
  • Arun, you are correct; however, I am needed a function that returned a vector, not a TRUE. – Brian Mar 18 '13 at 20:38
  • Joshua Ulrich, I think I may cry. Is it my environment? I'm working in RStudio 0.97.168 on Windows 7. – Brian Mar 18 '13 at 20:40
  • @Brian, I cut down the post a bit, it was too long and may sway some readers off from finishing to read... – Arun Mar 18 '13 at 20:46
  • @Brian, it gives me a `0` for `3/14/2002`. I'm trying to go through your code... – Arun Mar 18 '13 at 20:48
  • @JoshuaUlrich, I tried it without `as.numeric(as.character(.))`. – Arun Mar 18 '13 at 21:15
  • @Brian, what's the answer you expect for that entire column? – Arun Mar 18 '13 at 21:15
  • 1
    @Arun: `with(data, identical(Open[4],Avg[4])` is `TRUE` if you calculate `Avg` as `data$Avg <- c(NA, rowMeans(data[1:8,3:5]))`. But that's just lucky. It's generally futile to try and figure out how to get *exactly* the same floating point numbers. – Joshua Ulrich Mar 18 '13 at 21:16
  • aha.. that's trickier than I imagined. – Arun Mar 18 '13 at 21:23
  • 1
    @Arun: it's not tricky... it was sheer luck. I simply didn't like how the OP calculated the average, so I did it "my" way and the numbers happened to be identical. – Joshua Ulrich Mar 18 '13 at 21:24
  • Thanks everyone for your responses. I appreciate it. I still do not see how the article posted brings me closer to a solution. As I mentioned in my original question, all.equal does not return a vector , but a single TRUE or mean relative difference. – Brian Mar 18 '13 at 22:22

0 Answers0