2

I have a dataset that is organized like so:

> head(crypto_data)
                 time btc_price  btc_change btc_change_label eth_price block_size difficulty estimated_btc_sent estimated_transaction_volume_usd
1 2017-09-02 21:54:00  4537.834 -0.06630663              buy   330.727  142521291   8.88e+11           2.04e+13                        923315360
2 2017-09-02 22:29:00  4577.605 -0.05629429              buy   337.804  136524566   8.88e+11           2.03e+13                        918188067
3 2017-09-02 23:04:00  4566.360 -0.05971624              buy   336.938  134845546   8.88e+11           2.01e+13                        910440916
4 2017-09-02 23:39:00  4590.031 -0.05624237              buy   342.929  133910638   8.88e+11           1.99e+13                        901565930
5 2017-09-03 00:14:00  4676.193 -0.03585697             hold   354.171  130678099   8.88e+11           2.01e+13                        922422228
6 2017-09-03 00:49:00  4699.936 -0.03358492             hold   352.299  127557140   8.88e+11           1.99e+13                        910457430
   hash_rate miners_revenue_btc miners_revenue_usd minutes_between_blocks n_blocks_mined n_blocks_total n_btc_mined   n_tx nextretarget
1 7417412092               2395           10839520                   8.00            168         483207    2.10e+11 241558       483839
2 7152504517               2317           10482320                   8.33            162         483208    2.03e+11 236661       483839
3 7240807042               2342           10596900                   8.22            164         483216    2.05e+11 238682       483839
4 7284958305               2352           10642439                   8.14            165         483220    2.06e+11 237159       483839
5 7152504517               2316           10611798                   8.38            162         483223    2.03e+11 237464       483839
6 7064201992               2288           10481960                   8.41            160         483226    2.00e+11 234472       483839
  total_btc_sent total_fees_btc totalbtc trade_volume_btc trade_volume_usd
1       1.62e+14    29597881711 1.65e+15        102451.92        463497285
2       1.60e+14    29202300823 1.65e+15        102451.92        463497285
3       1.60e+14    29234981721 1.65e+15        102451.92        463497285
4       1.58e+14    28991577368 1.65e+15        102451.92        463497285
5       1.58e+14    29179041967 1.65e+15         96216.78        440710136
6       1.57e+14    28844391629 1.65e+15         96216.78        440710136
> str(crypto_data)
'data.frame':   895 obs. of  23 variables:
 $ time                            : POSIXct, format: "2017-09-02 21:54:00" "2017-09-02 22:29:00" "2017-09-02 23:04:00" "2017-09-02 23:39:00" ...
 $ btc_price                       : num  4538 4578 4566 4590 4676 ...
 $ btc_change                      : num  -0.0663 -0.0563 -0.0597 -0.0562 -0.0359 ...
 $ btc_change_label                : Factor w/ 3 levels "buy","hold","sell": 1 1 1 1 2 2 2 2 2 2 ...
 $ eth_price                       : num  331 338 337 343 354 ...
 $ block_size                      : num  1.43e+08 1.37e+08 1.35e+08 1.34e+08 1.31e+08 ...
 $ difficulty                      : num  8.88e+11 8.88e+11 8.88e+11 8.88e+11 8.88e+11 ...
 $ estimated_btc_sent              : num  2.04e+13 2.03e+13 2.01e+13 1.99e+13 2.01e+13 ...
 $ estimated_transaction_volume_usd: num  9.23e+08 9.18e+08 9.10e+08 9.02e+08 9.22e+08 ...
 $ hash_rate                       : num  7.42e+09 7.15e+09 7.24e+09 7.28e+09 7.15e+09 ...
 $ miners_revenue_btc              : num  2395 2317 2342 2352 2316 ...
 $ miners_revenue_usd              : num  10839520 10482320 10596900 10642439 10611798 ...
 $ minutes_between_blocks          : num  8 8.33 8.22 8.14 8.38 8.41 8.26 8.33 8.5 8.69 ...
 $ n_blocks_mined                  : num  168 162 164 165 162 160 157 161 159 156 ...
 $ n_blocks_total                  : num  483207 483208 483216 483220 483223 ...
 $ n_btc_mined                     : num  2.10e+11 2.03e+11 2.05e+11 2.06e+11 2.03e+11 ...
 $ n_tx                            : num  241558 236661 238682 237159 237464 ...
 $ nextretarget                    : num  483839 483839 483839 483839 483839 ...
 $ total_btc_sent                  : num  1.62e+14 1.60e+14 1.60e+14 1.58e+14 1.58e+14 ...
 $ total_fees_btc                  : num  2.96e+10 2.92e+10 2.92e+10 2.90e+10 2.92e+10 ...
 $ totalbtc                        : num  1.65e+15 1.65e+15 1.65e+15 1.65e+15 1.65e+15 ...
 $ trade_volume_btc                : num  102452 102452 102452 102452 96217 ...
 $ trade_volume_usd                : num  4.63e+08 4.63e+08 4.63e+08 4.63e+08 4.41e+08 ...

I then ran an SVM and tried to plot an ROC curve:

crypto_linear_svm <- svm(btc_change_label ~ ., data = crypto_trainingDS, method = "C-classification", kernel = "linear")
crypto_linear_svm_pred <- predict(crypto_linear_svm, crypto_testDS[,-3])
linear_crypto_conf_mat <- table(pred = crypto_linear_svm_pred, true = crypto_testDS[,3])
linear_svm_crypto_roc <- plot(multiclass.roc(crypto_testDS$btc_change_label, crypto_linear_svm_pred, direction="<"),
     col="yellow", lwd=3, main="Linear Kernal SVM results, Cryptocurrency Data")

However, the last line gives me the following error:

Error in roc.default(response, predictor, levels = X, percent = percent, : Predictor must be numeric or ordered.

What am I doing wrong and how can I fix this? I have two different datasets that have different structure and organization- the one shown is multiclass and the other one is binary (yes or no). I have run an SVM on both, but I get the same error for each when I try to plot the ROC.

EDIT Here is the output of the predictions:

> crypto_linear_svm_pred
   3    4    5    6    7    8   14   16   17   19   21   26   29   32   34   36   38   39   45   47   49   53   54   57   59   60   61   63   65 
 buy  buy hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold  buy  buy  buy  buy  buy  buy 
  67   69   71   74   78   86   89   91   92   95   96   97   98  105  111  113  115  116  122  123  124  127  132  135  140  141  156  160  161 
 buy  buy hold hold hold hold hold hold hold hold hold hold sell sell  buy  buy  buy  buy  buy  buy  buy  buy  buy hold hold hold hold  buy hold 
 164  166  170  173  174  175  179  184  188  190  196  208  210  212  214  217  218  219  224  225  227  229  238  240  245  249  259  263  267 
hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold  buy 
 273  274  281  282  284  306  307  311  313  315  320  323  324  328  330  332  333  334  336  340  342  343  346  347  349  353  358  361  365 
hold hold  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy 
 374  380  381  382  383  390  392  393  396  399  403  406  407  408  410  435  440  441  444  445  449  453  457  459  460  464  467  468  473 
sell sell sell sell sell sell sell sell sell sell sell sell sell hold hold  buy  buy  buy hold hold hold hold hold hold hold hold hold hold hold 
 483  489  490  492  499  503  511  520  521  530  534  536  538  546  548  553  555  557  558  559  567  571  573  579  581  583  584  586  587 
hold hold hold hold hold hold hold hold hold hold hold hold  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy  buy hold hold hold hold 
 593  595  597  602  603  608  609  614  616  618  628  630  636  639  642  643  645  646  647  648  649  655  660  661  665  668  669  674  675 
hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold 
 676  680  685  687  688  695  698  703  704  713  715  719  720  722  725  729  737  738  740  744  745  746  752  757  760  762  764  768  771 
hold hold hold hold hold hold sell sell sell sell hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold 
 776  778  781  783  784  790  792  805  811  813  814  815  821  822  824  828  829  833  836  837  838  839  843  846  847  848  852  859  861 
hold sell hold hold sell sell sell sell hold sell hold sell hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold hold 
 862  865  869  873  879  881  886  895 
hold hold hold hold hold hold hold sell 
Levels: buy hold sell
zsad512
  • 861
  • 3
  • 15
  • 41
  • If you are using `e1071` library try `predict(crypto_linear_svm, crypto_testDS[,-3], probability = TRUE)` – missuse Oct 04 '17 at 21:17
  • Can you show the output of `crypto_linear_svm_pred`? – Calimo Oct 05 '17 at 05:53
  • @missuse your suggestion produces the following error `Warning message: In predict.svm(crypto_linear_svm, crypto_testDS[, -3], probability = TRUE) : SVM has not been trained using `probability = TRUE`, probabilities not available for predictions.` – zsad512 Oct 05 '17 at 15:49
  • @Calimo I added the output for `crypto_linear_svm_pred` please see my edit – zsad512 Oct 05 '17 at 15:51
  • @zsad512 have you tried following the error code? Training the model with `probability = TRUE`: `svm(btc_change_label ~ ., data = crypto_trainingDS, method = "C-classification", kernel = "linear", probability = T)` – missuse Oct 05 '17 at 15:58
  • A ROC curve assesses the sensitivity and specificity of a binary classifier when the decision threshold is changed. As the error message suggests you need numbers (or at the very least an ordered factor) to vary the threshold. You can't vary the threshold of strings. – Calimo Oct 05 '17 at 19:43
  • Besides, you have 3 classes. ROC curves are defined for binary classification (2 classes). I'm not sure what you are expecting here. Therefore I'm voting to close this question. – Calimo Oct 05 '17 at 19:43
  • @Calimo if that were true then why does the `pROC` package specifically have a function `multiclass.roc`? ALSO, if you read the code I have provided- `btc_change_label` is a factor with 3 levels, 0,1,2 (which are not strings fyi). I believe you are gravely mistaken and that their is a way of creating an ROC for the data. – zsad512 Oct 05 '17 at 22:49
  • @missuse I tried your suggestion and changed the `svm` call but it still produces the same error when I try the `multiclass.roc` – zsad512 Oct 05 '17 at 22:51

1 Answers1

4

Here is an example with iris data:

data(iris)
library(e1071)
svm_model = svm(Species~., data = iris)
prob_svm = predict(svm_model, iris)


m.roc = multiclass.roc(iris$Species, as.numeric(prob_svm))

rs <- m.roc[['rocs']]
plot.roc(rs[[1]], lty=4)
sapply(2:length(rs),function(i) lines.roc(rs[[i]],col=i, lty=i))

enter image description here

This approach computes three ROC curves (setosa : versicolor, setosa : virginica and versicolor : virginica) and averages their AUC.

And it has several flaws. Converting predicted class to numeric is one. A better approach would be if the predicted probabilities could be used, but pROC does not support this behavior (I tried). And as Calimo pointed out ROC is mode for binary classifiers and should be used with care when more than 2 classes are present.
I used prediction on the train data only as an example, it should not be done when evaluating classifiers since it will overestimate model accuracy.

missuse
  • 19,056
  • 3
  • 25
  • 47