0

I have a text file with several hundred doubles separated by line breaks:

Ex:

-0.020000000000010232
0.09500000000001307
-0.05500000000000682
0.1599999999999966
0.07000000000000739
-0.0799999999999983
-0.07000000000000739
0.060000000000002274
0.04000000000000625
0.04999999999999716
-0.10000000000000853 ...

For further context, I'm trying a super simple version of the application of machine learning to stocks, where derivative series of differences in price are dumped in a text file. The program, whilst constantly updating the "top three" patterns, will attempt to reference a pattern whenever it sees similar derivative values to predict future changes. Less relevantly, several other heuristics considering both volatility and volumes are also included in the refinement of these patterns.

The first part of this is finding/guessing basic patterns from the text file. I have two questions here:

  1. How do I round off these values to the fourth decimal point? (0.0001)
  2. How do I find the three most commonly repeated sets of rounded doubles and write them to a separate text file?
riteshtch
  • 8,629
  • 4
  • 25
  • 38
Sid G.
  • 100
  • 1
  • 9

3 Answers3

0

For the first part, you can take a look at the DecimalFormat class. It should allow you to format numbers accordingly.

Secondly, you could use a Map to store the (rounded) number and its occurrence in the file.

Once finished, you would go through the map and see which 3 numbers have the highest occurrence (value for a given key).

npinti
  • 51,780
  • 5
  • 72
  • 96
  • How would I implement the Map such that it would find repeated sets of occurrences rather than repeated occurrences of a single double? Example: if 0.345 is repeated 8 times and 0.233 is repeated 7 times, they are not guaranteed to be adjacent to each other on the list. – Sid G. Mar 24 '16 at 18:59
  • @user3532216 so you want a frequency count, see my answer. – Peter Lawrey Mar 24 '16 at 21:15
0
  1. For the fourth decimal point part.

You can use Decimal Format API.

DecimalFormatt df = new DecimalFormat( "#.0000" );
df.format(num); // 4 places of decimals

2a. Find the three most commonly repeated sets of rounded doubles.

Use a map data structure to map rounded decimal values to their frequencies. A HashMapAPI would be great.

2b. Parse HashMap to get the most frequent decimal values and write them to a file using PrintWriter Reference

Let me know if something is not clear and I will try to help you out

Community
  • 1
  • 1
Debosmit Ray
  • 5,228
  • 2
  • 27
  • 43
  • How does this retrieve frequencies of sets of double numbers? It appears that HashMap would order them and find their independent frequencies. Example: (0.566 appeared 9 times) vs (0.342, 0.222, 0.455, 0.333 appeared 5 times) – Sid G. Mar 24 '16 at 18:51
  • That part wasn't very clear from your clear. You can do some sort of an o(n^2) algorithm to see if doubles at index `i`, `i+1`,.. appear somewhere else in the string. You would use a Hashmap to store this data. Run-time wise, I can't think of anything better than o(n^2) right now. – Debosmit Ray Mar 24 '16 at 19:24
0

You can do a frequency count in O(n) like this.

Map<Double, Long> freqMap = DoubleStream.of(doubles).boxed()
                    .collect(Collectors.groupingBy(d -> round4(d), Collectors.counting());
// you can sort by the count getting just the top 3.

While DecimalFormat will do the job, it is really slow. You can just use some Maths.

private static final double WHOLE_NUMBER = 1L << 53;
/**
 * Performs a round which is accurate to within 1 ulp. i.e. for values very close to 0.5 it
 * might be rounded up or down. This is a pragmatic choice for performance reasons as it is
 * assumed you are not working on the edge of the precision of double.
 *
 * @param d value to round
 * @return rounded value
 */
public static double round4(double d) {
    final double factor = 1e4;
    return d > WHOLE_NUMBER / factor || d < -WHOLE_NUMBER / factor ? d :
            (long) (d < 0 ? d * factor - 0.5 : d * factor + 0.5) / factor;
}

For example,

double[] doubles = { 0.123400000000000, 0.123399999999997};
for (double d : doubles) {
    System.out.println(round4(d));
}

prints

0.1234
0.1234

As an exercise I though I would test this brute force for every possible values between 0.0001 and 9.99995 using multiple threads.

double[] ds = new double[17];
ds[0] = 1e-4;
for (int i = 1; i < ds.length; i++)
    ds[i] = 2 * ds[i - 1];

DoubleStream.of(ds).parallel()
        .forEach(x -> {
            int counter = 0;
            for (double d = x; d <= 2 * x && d < 9.99995; d += Math.ulp(d)) {
                if (Double.toString(Maths.round4(d)).length() > 6)
                    fail("d: " + d);
                if ((counter & 127) == 0 && counter % 10_000_000 == 0)
                    System.out.println(d);
                counter++;
            }
        });
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Depending what OP is trying to do with the rounded number, this solution may not be correct. It works for things like "comparing 2 numbers, based on their value rounded to 4 decimal point", but not for "writing number in another text file, rounded to 4 decimal point" – Adrian Shum Mar 24 '16 at 07:44
  • @AdrianShum can you give an example of what you mean? – Peter Lawrey Mar 24 '16 at 08:35
  • What I mean is, in order to output a "rounded-to-4-decimal-place" number to a text file, he still need to perform formatting (by using `DecimalFormat` or by other way), else he will still see '0.123400000000000' or even things like '0.123399999999997' if he write the returned double value directly to file. – Adrian Shum Mar 24 '16 at 10:05
  • @AdrianShum Instead of saying, like X but not actually X. Can you give a specific example where if he rounds the result, he will see a problem? – Peter Lawrey Mar 24 '16 at 18:48
  • @AdrianShum I have added your examples to my answer, happy to add more, because I am not clear on what you mean. – Peter Lawrey Mar 24 '16 at 18:51
  • Thanks, the examples helped me understand the scope of the frequency count you were referring to earlier. – Sid G. Mar 24 '16 at 21:24