3

I have a set of vectors. I'm working on ways to reduce a n-dimensional vector to a unary value (1-d), say

(x1,x2,....,xn) ------> y

This single value needs to be the characteristic value of the vector. Each unique vector produces a unique output value. Which of the following methods is appropriate:

1- norm of the vector - square root of sum of squares that measures euclidian distance from origin

2- compute hash of F, using some hashing techniques avoiding collision

3- use linear regression to compute, y = w1*x1 + w2*x2 + ... + wn*xn - unlikely to be good if there is no good dependence of input values on output

4- feature extraction technique like PCA that assigns weights to each of x1,x2,..xn based on the set of input vectors

marc
  • 949
  • 14
  • 33
  • 1
    What's the objective of the dimensionality reduction? What are you trying to do with the vectors? If it's a machine learning problem, PCA would be best. – Alptigin Jalayr Apr 15 '13 at 16:06
  • 2
    It kinda depends on what you want to do with the unique values. Could you elaborate? – d.j.sheldrick Apr 15 '13 at 16:10
  • @d.j.sheldrick ; I would require these unique values to ease computation on the vectors. – marc Apr 15 '13 at 16:35
  • @AlptiginJalayr: I'm not quite sure if PCA gives unique values – marc Apr 15 '13 at 16:36
  • What kind of computations do you need to do on these vectors? – torquestomp Apr 15 '13 at 16:54
  • What are the sets X1, X2,..., Xn that x1, x2,..., xn belong to? Are the sets finite? – stuhlo Apr 15 '13 at 17:42
  • It depends on what computation you need this for. In general, you can't map a set to a smaller set and preserve uniqueness. – Don Reba Apr 16 '13 at 08:11
  • "I would require these unique values to ease computation on the vectors" Keep in mind that any dimensionality reduction would likely imply loss of information, can you elaborate which computation are you going to perform? I can't think right now a case where that reduction would be convenient to make computation easier – Pedrom Apr 16 '13 at 13:21

1 Answers1

0

It's unclear from the method what properties you need this transform to have, so I'm making a guess that you don't need the transformation to preserve any properties other than uniqueness, and possibly invertibility.

None of the techniques you suggest can in general avoid collisions:

  1. Norm - two vectors pointing in opposite directions have the same norm.

  2. Hash - if the input isn't known apriori - what is generally meant by hash function has a finite image, and you have an infinite number of possible vectors - no good.

  3. It's easy to find to vectors which give the same result for any linear regression result (think about it).

  4. PCA is a specific kind of linear transformation - hence the same problem as with linear regression.

So - if you're just looking for uniqueness, you could "stringify" your vectors. One way to do it is to write them down as text strings, with the different coordinates separated by a special character (underscore, for example). Then take the binary value of this string as your representation.

If space is important and you need a more efficient representation, you could think of a more efficient bit encoding: each character in the set 0,1,...,9,'.','' can be represented by 4 bits - a hexadecimal digit (map '.' to A and '' to B). Now encode this string as a hexadecimal number, saving half the space.

Guy Adini
  • 5,188
  • 5
  • 32
  • 34