Sort Java String array by multiple numbers

Question

I have a list of data in a txt file like this

Date,Lat,Lon,Depth,Mag

20000101,34.6920,-116.3550,12.30,1.21
20000101,34.4420,-116.2280,7.32,1.01
20000101,37.4172,-121.7667,5.88,1.14
20000101,-41.1300,174.7600,27.00,1.90
20000101,37.6392,-119.0482,2.40,1.03
20000101,32.1790,-115.0730,6.00,2.44
20000101,59.7753,-152.2192,86.34,1.48
20000101,34.5230,-116.2410,11.63,1.61
20000101,59.5369,-153.1360,100.15,1.62
20000101,44.7357,-110.7932,4.96,2.20
20000101,34.6320,-116.2950,9.00,1.73
20000101,44.7370,-110.7938,5.32,1.75
20000101,35.7040,-117.6320,4.15,1.45
20000101,41.9270,20.5430,10.00,4.80

my assignment is to sort these data by each criterion ex) sort by date, latitude and longtitude

i tried bubble sort like this

if ( Double.parseDouble(a[0].split(",")[1]) <  Double.parseDouble(a[1].split(",")[1]))

this works but takes too much time

theres 40000 data in the txt file

is there any alternative way to sort these data?

Store each line in a class object and then define different comparators for the list of that class object. Use this as an refenrece http://stackoverflow.com/questions/5245093/using-comparator-to-make-custom-sort — StackFlowed, Oct 23 '14 at 17:18

score 4 · Answer 1 · answered Oct 23 '14 at 17:22

4

Try a merge sort. Merge sort has a worst case performance of O(n log n). Bubble sort's worst case time is O(n^2).

answered Oct 23 '14 at 17:22

Dan Chrostowski

337
2
8

2

and if you don't know what this notation means, it's Big Oh Notation. O(nlogn) is better than O(n^2), meaning that, in the worst case, Merge Sort will perform way better than Bubble Sort. See here for more: http://en.wikipedia.org/wiki/Big_O_notation#Orders_of_common_functions (this will be important for you going forward). – muttley91 Oct 23 '14 at 17:25

Basil Bourque · Accepted Answer · 2014-10-24T04:37:40.310

I'm probably ruining some students’ homework assignment, but here goes…

As suggested on the Question, the natural way in Java is to create a class to represent your data. Then implement a Comparator to be passed to the utility method Collections.sort.

On my MacBook Pro 2.3 GHz Intel Core i7 running a Parallels virtual machine with Java 8, a data set of 42,000 elements takes between 45-90 milliseconds to sort.

I changed your example data to be more interesting, introducing some varied dates and duplicate latitudes.

20000101,34.6920,-116.3550,12.30,1.21
20000101,34.4420,-116.2280,7.32,1.01
20000101,34.6920,-121.7667,5.88,1.14
20000101,-41.1300,174.7600,27.00,1.90
20000101,37.6392,-119.0482,2.40,1.03
20000101,32.1790,-115.0730,6.00,2.44
20000101,34.6920,-152.2192,86.34,1.48
20000102,34.6320,-116.2410,11.63,1.61
20000102,59.5369,-153.1360,100.15,1.62
20000102,44.7357,-110.7932,4.96,2.20
20000102,34.6320,-116.2950,9.00,1.73
20000102,34.6320,-110.7938,5.32,1.75
20000102,34.6320,-117.6320,4.15,1.45
20000102,41.9270,20.5430,10.00,4.80

My GeoReading class to represent the data.

class GeoReading
{

    LocalDate localDate = null;
    BigDecimal latitude = null;
    BigDecimal longitude = null;
    BigDecimal depth = null;
    BigDecimal magnitude = null;

    public GeoReading( String arg )
    {
        // String is comma-separated values of: Date,Lat,Lon,Depth,Mag
        List<String> items = Arrays.asList( arg.split( "\\s*,\\s*" ) ); // Regex explained here: http://stackoverflow.com/a/7488676/642706
        this.localDate = ISODateTimeFormat.basicDate().parseLocalDate( items.get( 0 ) );
        this.latitude = new BigDecimal( items.get( 1 ) );
        this.longitude = new BigDecimal( items.get( 2 ) );
        this.depth = new BigDecimal( items.get( 3 ) );
        this.magnitude = new BigDecimal( items.get( 4 ) );
    }

    @Override
    public String toString()
    {
        return "GeoReading{" + "localDate=" + localDate + ", latitude=" + latitude + ", longitude=" + longitude + ", depth=" + depth + ", magnitude=" + magnitude + '}';
    }

}

Here is the Comparator implementation.

class GeoReadingAscendingComparator implements Comparator<GeoReading>
{

    @Override
    public int compare( GeoReading o1 , GeoReading o2 )
    {
        int localDateCompare = o1.localDate.compareTo( o2.localDate );
        if ( localDateCompare != 0 ) { // If not equal on this component, so compare on this.
            return localDateCompare;
        }

        int latitudeCompare = o1.latitude.compareTo( o2.latitude );
        if ( latitudeCompare != 0 ) { // If not equal on this component, so compare on this.
            return latitudeCompare;
        }

        return o1.longitude.compareTo( o2.longitude );

    }
}

Main code.

Path path = Paths.get( "/Users/basil/lat-lon.txt" );  // Path for Mac OS X.
try {
    List<GeoReading> list = new ArrayList<>();
    Stream<String> lines = Files.lines( path );
    lines.forEach( line -> list.add( new GeoReading( line ) ) );
    // Take those 14 lines and multiply to simulate large text file. 14 * 3,000 = 42,000.
    int count = 3000;
    List<GeoReading> bigList = new ArrayList<>( list.size() * count ); // Initialze capacite to expected number of elements.
    for ( int i = 0 ; i < count ; i++ ) {
        bigList.addAll( list );
    }
    long start = System.nanoTime();
    Collections.sort( bigList , new GeoReadingAscendingComparator() );
    long elapsed = ( System.nanoTime() - start );
    System.out.println( "Done sorting the GeoReading list. Sorting " + bigList.size() + " took: " + TimeUnit.MILLISECONDS.convert( elapsed , TimeUnit.NANOSECONDS ) + " ms ( " + elapsed + " nanos )." );

    System.out.println( "Dump…" );
    for ( GeoReading g : bigList ) {
        System.out.println( g );
    }
} catch ( IOException ex ) {
    System.out.println( "ERROR - ex: " + ex );
}

In the real world I would add some defensive-programming code to verify the incoming data. Data from external sources is always defective and/or changing.

@user949300 **Accuracy > Performance.** I have a habit of using [BigDecimal](http://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html) in my usual projects working with money and such where accuracy is required. If this scientific data could tolerate the inaccuracy of floating-point drift, then go for it. Might be faster, though for a total execution time of less than a second I would consider that [premature optimization](http://en.wikipedia.org/wiki/Program_optimization). — Basil Bourque, Oct 24 '14 at 04:47
OP has 40K records, 5 fields each, so using BigXXX is 200K Objects. An Object is, very roughly, 10 extra bytes over a primitive, so that's 2MB. Which, I guess, isn't all that much nowadays, but it is something. — user949300, Oct 24 '14 at 05:36
@user949300 BigDecimal is [at least 2-3 times bigger](http://stackoverflow.com/q/2501176/642706) than just the minimal Object overhead you mentioned. But still not a large amount of memory on multi-gig machines. If memory were an issue, I would move the data to a database like [Postgres](http://en.wikipedia.org/wiki/PostgreSQL). So, again, it is not a matter of preference but need. If you are certain that floating-point inaccuracy is tolerable then use a float (32-bit) or double (64-bit), better for both memory and speed. If accuracy is needed, *always* use BigDecimal. — Basil Bourque, Oct 24 '14 at 06:21
For those who are unaware, read about [how floating-point number technology trades away accuracy](http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems) for performance (speed of execution). The floating-point primitive data types in Java are [`float` (32-bit) and `double` (64-bit)](http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html). — Basil Bourque, Oct 26 '14 at 20:10

Sort Java String array by multiple numbers

2 Answers2

Linked

Related