2

I'm working on a clustering program, and have a dataset of doubles that I need to normalize in order to make sure that every double (variable) has the same influence.

I would like to use min-max normalization where for every variable the min and max value are determined, but I'm not sure how I could implement this on my dataset in Java. Does anyone have any suggestions?

Ortomala Lokni
  • 56,620
  • 24
  • 188
  • 240
user3470173
  • 61
  • 1
  • 2
  • 6
  • Is there any sample code you could add here? As it stands, your question is a little too vague to answer. – Donald_W Jun 06 '15 at 21:25

2 Answers2

7

The Encog Project wiki gives a utility class that does range normalization.

The constructor takes the high and low values for input and normalized data.

/**
     * Construct the normalization utility, allow the normalization range to be specified.
     * @param dataHigh The high value for the input data.
     * @param dataLow The low value for the input data.
     * @param dataHigh The high value for the normalized data.
     * @param dataLow The low value for the normalized data. 
     */
    public NormUtil(double dataHigh, double dataLow, double normalizedHigh, double normalizedLow) {
        this.dataHigh = dataHigh;
        this.dataLow = dataLow;
        this.normalizedHigh = normalizedHigh;
        this.normalizedLow = normalizedLow;

You can then use the normalize method on a sample.

/**
 * Normalize x.
 * @param x The value to be normalized.
 * @return The result of the normalization.
 */
public double normalize(double x) {
    return ((x - dataLow) 
            / (dataHigh - dataLow))
            * (normalizedHigh - normalizedLow) + normalizedLow;
}

To find the minimum and the maximum of your dataset, use one answer of this question : Finding the max/min value in an array of primitives using Java.

Community
  • 1
  • 1
Ortomala Lokni
  • 56,620
  • 24
  • 188
  • 240
  • Thank you @OrtomalaLokni, it helped me out – Diogo Antunes Sep 10 '15 at 11:03
  • @OrtomalaLokni - I know its been a while since that answer but i dont understand how you can obtain normalizeHigh and normalizeLow and use that for normalising? Surely you will only know those values after you normalise? Thanks. – MTA Jan 14 '18 at 11:41
  • You have to define yourself these values, by default you would choose 0 and 1 but depending on your application you can choose different values. – Ortomala Lokni Jan 14 '18 at 13:53
1

You can very well use StatUtils.normalize method within apache.commons.math3 library

Refer the following documentation https://commons.apache.org/proper/commons-math/javadocs/api-3.4/org/apache/commons/math3/stat/StatUtils.html#normalize(double[])

Gradle dependency is as follows

implementation 'org.apache.commons:commons-math3:3.6.1'

Maven Dependency

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-math3 -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-math3</artifactId>
    <version>3.6.1</version>
</dependency>

Example

     public static void main(String[] args) {
            double[] arr = new double[]{900.68, 900.63, 900.74, 900.59, 900.49, 900.65, 900.81, 900.82, 901.03, 900.74, 900.66, 900.49, 900.52, 900.63, 900.45};
            double normArr[] = StatUtils.normalize(arr);
            for (int i = 0; i < normArr.length; i++) {
                System.out.print(normArr[i] + ", ");
            }
}

This would print out the values : 0.11787856446848383, -0.20956189238965656, 0.5108071126989968, -0.47151425787616885, -1.1263951715931941, -0.0785857096464004, 0.9692237523003934, 1.034711843672766, 2.4099617624777, 0.5108071126989968, -0.013097618274772323, -1.1263951715931941, -0.9299308974783099, -0.20956189238965656, -1.3883475370797065

Madhu Tomy
  • 662
  • 11
  • 25