I recently learned about Poisson Regression, and am wanting to apply this new-to-me statistical method to real world problems. So I thought about it for a while, and decided I wanted to try and predict stock volumes of the Fortune 500 based on financial information of a random sampling of companies.
The problem I am encountering, is while the model accounts for a massive amount of variance and contains only significant predictors, when I try to get the Poisson model to make predictions using the predict function, it returns predictions with virtually no variance that are way off the actual value.
The dataset I am playing with is not fully filled in, but I decided to take a peak at results with a small sample size. I did this because I read something online that suggested the needed power for poisson regression was lower for large numbers, and stock market volume includes some massive numbers. The dataset can be accessed here:
https://drive.google.com/file/d/1qvkwWSfUSodfceyNLvPjA4jqnWTDTeSo/view?usp=sharing
The code I used is presented below:
Stock<-read.csv("C:/FilePath/StockPrices.csv")
head(Stock)
summary(StockTest <- step(glm(formula = X2018.Volume ~ X2017.Stock.Price + X2017.Volume+Total.Revenue+Cost.of.Revenue+Research...Development+Selling.General...Administrative+Interest.Expense+Total.Other.Income...Expenses.Net+Income.Before.Tax+Income.Tax.Expense+Income.From.Continuing.Operation+Net.Income+Enviornment+Social+Governance, family = "poisson", data = Stock)))
1-StockTest$deviance/StockTest$null.deviance
predict(StockTest)
The model has a great Pseudo R-squared, but its predicted values are way off the actual values. See for yourself:
predict(StockTest) 15.47486 15.00441 15.00881 14.01175 15.01126 16.24620 15.99307 15.68193 15.67123 14.98932 14.77741 15.43363 12.07001 13.84586 15.83090 14.28052 15.16039 13.83686
Versus
Stock[,"X2018.Volume"] [1] 5160000 110853500 3310000 3310000 1200000 876000 3310000 11400000 8830000 6380000 6410000 [12] 820000 3500000 2620000 4860000 199000 741000 7680000 1287769 3810000 1460000 2310000
What about this am I doing wrong? Are there special considerations that need to be made when using the predict function on a Poisson function? Is Poisson regression not the appropriate analysis for the data I am playing with?