I have a csv file consisting of around 200.000 rows of transactions. Here is the import and little preprocessing of the data:
data <- read.csv("bitfinex_data/trades.csv", header=T)
data$date <- as.character(data$date)
data$date <- substr(data$date, 1, 10)
data$date <- as.numeric(data$date)
data$date <- as.POSIXct(data$date, origin="1970-01-01", tz = "GMT")
head(data)
id exchange symbol date price amount sell
1 24892563 bf btcusd 2018-01-02 00:00:00 13375 0.05743154 False
2 24892564 bf btcusd 2018-01-02 00:00:01 13374 0.12226129 False
3 24892565 bf btcusd 2018-01-02 00:00:02 13373 0.00489140 False
4 24892566 bf btcusd 2018-01-02 00:00:02 13373 0.07510860 False
5 24892567 bf btcusd 2018-01-02 00:00:02 13373 0.11606086 False
6 24892568 bf btcusd 2018-01-02 00:00:03 13373 0.47000000 False
My goal is to obtain hourly sums of amount of token being traded. For this I need to split my data based on hours, which I did in a following way:
tmp <- split(data, cut(data$date,"hour"))
However this is taking way too long (up to 1 hour) and I wonder whether or not this is normal behaviour for functions such as split()
and cut()
? Is there any alternative to using those two functions?
UPDATE:
After using great suggestion by @Maurits Evers, my output table looks like this:
# A tibble: 25 x 2
date_hour amount.sum
<chr> <dbl>
1 1970-01-01 00 48.2
2 2018-01-02 00 2746.
3 2018-01-02 01 1552.
4 2018-01-02 02 2010.
5 2018-01-02 03 2171.
6 2018-01-02 04 3640.
7 2018-01-02 05 1399.
8 2018-01-02 06 836.
9 2018-01-02 07 856.
10 2018-01-02 08 819.
# ... with 15 more rows
This is exactly what I wanted, expect for the first row, where the date is from year 1970. Any suggestion on what might be causing the problem? I tried to change the origin parameter of as.POSIXct()
function but that did not solve the problem.