4

I am having trouble deleting duplicated rows in an xts object. I have a R script that will download tick financial data of a currency and convert it to an xts object of OHLC format. The script also pulls new data every 15 minutes. The new data is downloaded from the first trade of today to the last recorded trade of today. The old previous data downloaded was stored in .Rdata format and called. Then the new data is added to the old data and it overwrites the old data in .Rdata format.

Here is an example of what my data looks like:

                      .Open   .High    .Low  .Close   .Volume .Adjusted
2012-01-07 00:00:11 6.69683 7.01556 6.38000 6.81000  48387.58   6.81000
2012-01-08 00:00:09 6.78660 7.20000 6.73357 7.11358  57193.53   7.11358
2012-01-09 00:00:57 7.08362 7.19100 5.81000 6.32570 148406.85   6.32570
2012-01-10 00:01:01 6.32687 6.89000 6.00100 6.36000 110210.25   6.36000
2012-01-11 00:00:07 6.44904 7.13800 6.41266 6.90000  99442.07   6.90000
2012-01-12 00:01:02 6.90000 6.99700 6.33700 6.79999 140116.52   6.79999
2012-01-13 00:02:01 6.78211 6.80400 6.40000 6.41000  60228.77   6.41000
2012-01-14 00:00:23 6.42000 6.50000 6.23150 6.31894  25392.98   6.31894

Now if I run the script again I will add the new data to the xts.

                      .Open   .High    .Low  .Close   .Volume .Adjusted
2012-01-07 00:00:11 6.69683 7.01556 6.38000 6.81000  48387.58   6.81000
2012-01-08 00:00:09 6.78660 7.20000 6.73357 7.11358  57193.53   7.11358
2012-01-09 00:00:57 7.08362 7.19100 5.81000 6.32570 148406.85   6.32570
2012-01-10 00:01:01 6.32687 6.89000 6.00100 6.36000 110210.25   6.36000
2012-01-11 00:00:07 6.44904 7.13800 6.41266 6.90000  99442.07   6.90000
2012-01-12 00:01:02 6.90000 6.99700 6.33700 6.79999 140116.52   6.79999
2012-01-13 00:02:01 6.78211 6.80400 6.40000 6.41000  60228.77   6.41000
2012-01-14 00:00:23 6.42000 6.50000 6.23150 6.31894  25392.98   6.31894
2012-01-14 00:00:23 6.42000 6.75000 6.22010 6.57157  75952.01   6.57157

As you can see the last line is the same day as the second to last line. I want to keep the last row for the last date and delete the second to last row. When I try the following code to delete duplicated rows it does not work, the duplicated rows stay there.

xx <- mt.xts[!duplicated(mt.xts$Index),]
xx
.Open .High .Low .Close .Volume .Adjusted

I do not get any result. How can I delete duplicate data entries in an xts object using the Index as the indicator of duplication?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Kevin
  • 1,112
  • 2
  • 15
  • 29
  • Perhaps you meant `!duplicated(mt.xts)`? – joran Jan 14 '12 at 20:57
  • I was thinking I need to either find a way to delete based on row.names, or use both the .Open and the .Adjusted as indicators of duplicate rows. Using the index would be the best as there may be a chance in the future that the Open and Adjusted values are the same for different dates. – Kevin Jan 14 '12 at 21:14
  • @joran When I do xx = !duplicated(mt.xts) I only get a logical vector. In a previous use of what I did before it seemed to work but their object was not xts. – Kevin Jan 14 '12 at 21:19
  • Sorry, don't know what I was thinking. I can't use the .Adjusted to determine duplicate rows. Since this is currency data it is the same as .Close – Kevin Jan 14 '12 at 21:33

2 Answers2

15

Should't it be index(mt.xts) rather than mt.xts$Index? The following seems to work.

# Sample data
library(xts)
x <- xts( 
  1:10, 
  rep( seq.Date( Sys.Date(), by="day", length=5 ), each=2 ) 
)

# Remove rows with a duplicated timestamp
y <- x[ ! duplicated( index(x) ),  ]

# Remove rows with a duplicated timestamp, but keep the latest one
z <- x[ ! duplicated( index(x), fromLast = TRUE ),  ]
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
2

In my case,

x <- x[! duplicated( index(x) ),]

did not work as intended, because the system somehow makes date-time unique in each row.

x <- x[! duplicated( coredata(x) ),]

This may work if the previous solution did not help.