5

I am working currently on a problem where I have to solve either an L2-regularized logistic regression or L2-reg linear SVM problem, where I have an added affine term.

So my problem for example is:

min_ w {C*sum_i max(1-w*x_i*y_i,0) + 0.5*||w||^2_2 + w * v }

where v is a constant vector.

Of course this is a convex problem and can be solved with the usual methods, but I have to solve many large problems of this type, so I would very much like to use a standard library such as liblinear.

My question is, is there a way to transform the data x, the labels y, or the weighing factor C (perhaps into a different C_i for each instance), such that this problem will be equivalent to a standard hinge-loss SVM or logistic regression problem?

ualex
  • 51
  • 5

2 Answers2

5

I can't think of a way to turn it into something which can be processed by something like liblinear. However, you could easily solve this optimization problem with one of the general purpose cutting plane optimization libraries. All you have to do is write code to compute an element of the subgradient (which is just w + v - Csum_i x_iy_i in your case) and the value of the objective. Then a cutting plane routine can find the optimal w.

There is a CPA optimizer in Shogun and also one in dlib. I haven't used Shogun's version but I have used the one in dlib for a lot of problems (I'm also the author of dlib).

Davis King
  • 4,731
  • 1
  • 25
  • 26
0

It's possible if your off-the-shelf training algorithm lets you to bias the hinge loss or logistic regression for each data point. Here's how.

Complete the square on the last two terms:

  0.5 ||w||^2 + w'v 
= 0.5 ||w+v/2||^2 - v'v/2

then introduce the change of variable

u = w+v/2

Your optimization is then equivalent to

min_u C*sum_i max(1-(u-v/2)*x_i*y_i,0) + 0.5*||u||^2_2

which, with b_i = 1+v'x_i*y_i/2, is equivalent to

min_u   C*sum_i max(b_i - u*x_i*y_i ,0) + 0.5*||u||^2_2

So, if your training algorithm lets you replace 1 with a b_i of your choosing for each data point, it can solve this problem.

Almost every package will accommodate b_i in one way or another. For example, the above is equivalent to

min_u   C*sum_i b_i  max(1 - u*x_i*y_i/b_i ,0) + 0.5*||u||^2_2

(assuming b_i>0) so if your package allows you weight each point differently, you can solve the above problem.

moos
  • 2,444
  • 2
  • 15
  • 14