Normalizing Vectors with Negative values

Question

I want to represent each text-based item I have in my system as a vector in vector space model. The values for the terms can be negative or positive that reflect the frequency of a term in the positive or negative class. The zero value means neutral for example:

Item1 (-1,0,-5,4.5,2)

Item2 (2,6,0,-4,0.5)

My questions are:

1- How can I normalize my vectors to a range of [0 to 1] where:

.5 means zero before normalization

and .5> if it is positive

.5< if it negative

I want to know if there is a mathematical formula to do such a thing.

2- Will similarity measure choice be different after the normalization?? For example can I use Cosine similarity?

3- Will it be difficult if I preform dimensionality reduction after the normalization??

Thanks in advance

score 3 · Answer 1 · answered Mar 02 '16 at 00:18

One solution could be to use the MinMaxScaler which scales the number between (0, 1) range and then divide each row by the sum of the row. In python using sklearn you can do something like this:

from sklearn.preprocessing import MinMaxScaler, normalize
scaler = MinMaxScaler()
scaled_X = scaler.fit_transform(X)
normalized_X = normalize(scaled_X, norm='l1', axis=1, copy=True)

Normalizing Vectors with Negative values

1 Answers1