I'm trying to build a model that will predict how many deals will be done by one of our offices for a given month.
I started trying to learn how to build a model like this using this article: https://medium.com/@davidsb/datascience-for-developers-build-your-first-predictive-model-with-r-a798f684752f
However, it seems like the model they're building is for 1 factor. Ideally I'd like to be able to select month = January, office = Atlanta
and the output would be an estimate of the number of deals that the Atlanta office could expect to do in January.
My dataset is organized as the following:
Office DealMonth DealYear CountDeals
Atlanta 1 2015 10
Atlanta 2 2016 35
Is there an easy way to tweak the basic model outlined in the article to get my desired outcome?
Edit: Code as it stands below:
dat = read.csv("RawDataDealCountSummary.csv")
head(dat)
str(dat)
dat$DealMonth = factor(dat$DealMonth)
train_data = dat[dat$DealYear<2017,]
test_data = dat[dat$DealYear == 2017,]
head(train_data)
head(test_data)
test_counts <- test_data$DealCount
plot(dat$ï..DealOffice, dat$DealCount)
model=rpart(DealCount ~ ï..DealOffice + DealMonth, train_data,)
p = predict(model, test_data)
plot(p - test_counts)
predict(model, data.frame(ï..DealOffice = factor('Atlanta'), DealMonth = factor(12)))