-3

let me ask the question in detail with an example:

I have a historical data set with columns (a,b,c,d,e,f,g)

Now I have to predict (b,c,d,e,f,g) based on the value of 'a'.

Just replace a,b,c,d,e,f,g with a real world example.

Consider a data set which contains the revenue of a bike rental store on a day based on the number of rentals and cost per hour for rent.

Now my goal is to predict the number of rentals per month and cost per hour to reach my revenue goal of $50k.

Can this done? Just need some direction for how to do that

rudolph1024
  • 962
  • 1
  • 12
  • 32
rrsa
  • 11
  • 6
  • Yes, it can be done. This question is likely going to be closed/deleted as too broad. One way is multinomial logistic regression. Perhaps try posting on CrossValidated. – JasonAizkalns Jul 06 '15 at 17:58
  • Please give us an example data, as well as what you've tried thus far. [Here's](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) an overview of a good post. – Andrew Taylor Jul 06 '15 at 17:59
  • @JasonAizkalns is multinomial logistic regression really a fit here? Assuming b/c/d/e/f/g are independent variables and not categories of one variable. I don't think there's anything that can be done mathematically outside of just predicting each variable individually, right? Just thinking intuitively – ila Jul 06 '15 at 18:03
  • Consider, you have a dataset which says the revenue of bike rental store on a day based on number of rentals, cost per hour for rent. Now my goal is to predict the number of rentals per month and cost per hour to reach my revenue goal of $50k? – rrsa Jul 06 '15 at 18:35
  • This can't be done. $50k is too much! – user443854 Jul 06 '15 at 18:50

2 Answers2

2

You are basically want to maximize:

P(B|A)*P(C|A,B)*P(D|A,B,C)*P(E|A,B,C,D)*P(F|A,B,C,D,E)*P(G|A,B,C,D,E,F)

If the data B,C,D,E,F,G is all i.i.d. (but does depend on A) you are basically trying to maximize:

P = P(B|A)*P(C|A)*P(D|A)*P(E|A)*P(F|A)*P(G|A)

One approach to solve it is with Supervised Learning.

Train a set of classifiers (or regressors, depending on the values of B,C,D,E,F,G): A->B, A->C ... A->G with your historical data, and when given a query of some value a, use all classifiers/regressors to predict the values of b,c,d,e,f,g.

The "trick" is to use multiple learners for multiple outputs. Note that in the case of I.I.D dependent variables, there is no loss on doing that, since maximizing every P(Z|A) seperatedly, also maximizes P.


If the data is not i.i.d, the problem to maximize P(B|A)*P(C|A,B)*P(D|A,B,C)*P(E|A,B,C,D)*P(F|A,B,C,D,E)*P(G|A,B,C,D,E,F) which is NP-Hard, and is reduceable from Hamiltonian-Path Problem (P(X_i|X_i-1,...,X_1) > 0 iff there is an edge (X_i-1,X_i), looking for a non-zero path).

amit
  • 175,853
  • 27
  • 231
  • 333
  • Thank you Amit, i have edited my question with an example! Could you please have a look at it! – rrsa Jul 06 '15 at 18:39
0

This is a typical classification problem, a trivial example is "if a>0.5 then b=1 and c=0 ... while if a<=0.5, then b=0 and c=1 ..." You may like to look at nnet or h2o.

kimman
  • 371
  • 2
  • 4