0

I recently learned about Poisson Regression, and am wanting to apply this new-to-me statistical method to real world problems. So I thought about it for a while, and decided I wanted to try and predict stock volumes of the Fortune 500 based on financial information of a random sampling of companies.

The problem I am encountering, is while the model accounts for a massive amount of variance and contains only significant predictors, when I try to get the Poisson model to make predictions using the predict function, it returns predictions with virtually no variance that are way off the actual value.

The dataset I am playing with is not fully filled in, but I decided to take a peak at results with a small sample size. I did this because I read something online that suggested the needed power for poisson regression was lower for large numbers, and stock market volume includes some massive numbers. The dataset can be accessed here:

https://drive.google.com/file/d/1qvkwWSfUSodfceyNLvPjA4jqnWTDTeSo/view?usp=sharing

The code I used is presented below:

Stock<-read.csv("C:/FilePath/StockPrices.csv")
head(Stock)
summary(StockTest <- step(glm(formula = X2018.Volume ~ X2017.Stock.Price + X2017.Volume+Total.Revenue+Cost.of.Revenue+Research...Development+Selling.General...Administrative+Interest.Expense+Total.Other.Income...Expenses.Net+Income.Before.Tax+Income.Tax.Expense+Income.From.Continuing.Operation+Net.Income+Enviornment+Social+Governance, family = "poisson", data = Stock)))

1-StockTest$deviance/StockTest$null.deviance
predict(StockTest)

The model has a great Pseudo R-squared, but its predicted values are way off the actual values. See for yourself:

predict(StockTest) 15.47486 15.00441 15.00881 14.01175 15.01126 16.24620 15.99307 15.68193 15.67123 14.98932 14.77741 15.43363 12.07001 13.84586 15.83090 14.28052 15.16039 13.83686

Versus

Stock[,"X2018.Volume"] [1] 5160000 110853500 3310000 3310000 1200000 876000 3310000 11400000 8830000 6380000 6410000 [12] 820000 3500000 2620000 4860000 199000 741000 7680000 1287769 3810000 1460000 2310000

What about this am I doing wrong? Are there special considerations that need to be made when using the predict function on a Poisson function? Is Poisson regression not the appropriate analysis for the data I am playing with?

  • 1
    You need to do `predict(StockTest, type="response")`. See [this answer](https://stackoverflow.com/a/12201502/6574038). Besides you should overthink the `0` values in your table, are they really `0` or actually missing, i.e. `NA`? – jay.sf Nov 02 '19 at 04:59

1 Answers1

0

First you need to read the manual page for predict.glm:

predict(Stock.glm, type="response")
#          1          3          4          5          7          8          9 
#  5255867.7  3283450.0  3297945.2  1216812.4  3306021.9 11366695.1  8824739.9 
#         10         11         13         14         15         16         17 
#  6465084.7  6396289.7  3234293.9  2616649.3  5043601.7   174557.7  1030814.3 
#         18         19         20         21 
#  7503622.7  1592024.5  3837723.8  1021574.3 
Stock.glm$model$X2018.Volume
#  [1]  5160000  3310000  3310000  1200000  3310000 11400000  8830000  6380000  6410000
# [10]  3500000  2620000  4860000   199000   741000  7680000  1287769  3810000  1460000

You cannot compare to the original data since there are missing values. As a result 4 rows of the original data are missing from the data used in the analysis.

cor(Stock.glm$model$X2018.Volume, predict(Stock.glm, type="response"))
# [1] 0.9983086
dcarlson
  • 10,936
  • 2
  • 15
  • 18