1

I'm trying to build a model that will predict how many deals will be done by one of our offices for a given month.

I started trying to learn how to build a model like this using this article: https://medium.com/@davidsb/datascience-for-developers-build-your-first-predictive-model-with-r-a798f684752f

However, it seems like the model they're building is for 1 factor. Ideally I'd like to be able to select month = January, office = Atlanta and the output would be an estimate of the number of deals that the Atlanta office could expect to do in January.

My dataset is organized as the following:

Office   DealMonth   DealYear   CountDeals 
Atlanta  1           2015       10
Atlanta  2           2016       35

Is there an easy way to tweak the basic model outlined in the article to get my desired outcome?

Edit: Code as it stands below:

dat = read.csv("RawDataDealCountSummary.csv")
head(dat)

str(dat)

dat$DealMonth = factor(dat$DealMonth)

train_data = dat[dat$DealYear<2017,]
test_data = dat[dat$DealYear == 2017,]

head(train_data)
head(test_data)

test_counts <- test_data$DealCount

plot(dat$ï..DealOffice, dat$DealCount)

model=rpart(DealCount ~ ï..DealOffice + DealMonth, train_data,)

p = predict(model, test_data)
plot(p - test_counts)

predict(model, data.frame(ï..DealOffice = factor('Atlanta'), DealMonth = factor(12)))
walkery
  • 11
  • 2
  • 1
    So what exactly is your question? When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Show the code you tried and describe exactly where you are getting stuck. – MrFlick May 15 '18 at 16:39
  • The link seems to explain things just fine. Would be `lm(CountDeals ~ Office + DealMonth + DealYear, ...)` – spinodal May 15 '18 at 16:49
  • @MrFlick This is the code I used to try to build the model: {model=rpart(dat$Count.of.Deal.ID ~ dat$Deal.Month + dat$ï..Office, train_data,) predict(model, data.frame(office = factor('Atlanta'), Deal.Month = factor(1)))}. However the output is the same for every office location I try to input. I'm still very much a beginner so for all I know I'm doing it correctly, just interpreting the results incorrectly. Thanks! – walkery May 15 '18 at 18:25
  • Please edit code samples into your question (and format them) rather than leaving them down in comments. – Gregor Thomas May 15 '18 at 18:33
  • Also, don't use `dat$` in your model formula, that's what the `data` argument is for. In your model as defined do you even know if you are using data from `dat` or from `train_data`? – Gregor Thomas May 15 '18 at 18:33
  • @Gregor sorry I added the code samples to the main question with appropriate formatting. And when I don't use dat$ it says it can't find the function. How do I use the "data" argument? And I'm realizing that the model should be using "test_data" rather than "dat". – walkery May 15 '18 at 18:39
  • @Gregor Thanks, I was able to get it to work a little better when I didn't include dat$ in the model formula. However, I'm running into an issue where no matter what month I input, it produces the same output for every office. Is this a problem with the way the model is tuned? – walkery May 15 '18 at 18:53

0 Answers0