Dynamically reading a CSV file

Question

Introduction

I am working in an automation project in order to learn new tricks with java and data science (at the very easy level), everything self taught.

Problem

Here is an example .csv file of how I store this data.

Date when obtained
Format for identifying the numbers below
data
.
.
.
.
data

CSV I am currently using.

2018/12/29
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,
2018/12/30
name,quantity,quality,realmQ,cost
Tejido,252 229,12.86,43.14,$18.87,
Ropa,132 392,18.09,46.02,$177.58,
Gorra de visera,87 676,14.42,42.46,$122.48,
Cerveza,44 593,2.72,17.79,$18.71,
Mercancías de playa,44 593,8.26,39.56,$200.78,
Bebidas alcohólicas,27 306,4.30,23.88,$31.95,
Artículos de cuero,16 147,21.08,43.91,$207.49,
Bolsas y carteras,6 552,21.11,40.59,$1 195.41,
2019/01/02
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,
2019/01/03
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,

I want to make it dynamic and also bigger. Instead of multiple .csv files classified by date I decided to have one big .csv file to store everything and that is the result.

The code I used so far can read a single .csv but if I add more data below. It doesn't work. I know it is something related with the loop as I see in the debugger, but still can't find the right solution.

Code

public class CSVinput {

    static String[] nombre = new String[8];
    static int[] cantidad = new int[8];
    static double[] calidad = new double[8];
    static double[] realmQ = new double[8];
    static double[] coste = new double[8];    

public static void ImportData(String path) throws FileNotFoundException
{
    /*Can only load one csv with 8 stuff in it*/
    System.out.println("Presenting data...");


        try (Scanner scan = new Scanner(new File(path))) {
            scan.useDelimiter(",");
            String date = scan.nextLine();
            System.out.println("fecha: " + date);
            scan.nextLine();

            int index = 0;
            while(scan.hasNext() == true)
                try{
                {                    
                    String name = scan.next().replaceAll("\n", "");
                        nombre[index] = name;
                    System.out.println("nombre: " + name);
                    int quantity = Integer.parseInt(scan.next().replaceAll(" ", ""));
                        cantidad[index] = quantity;
                    System.out.println("cantidad: " + quantity);
                    double quality = Double.parseDouble(scan.next());
                        calidad[index] = quality;
                    System.out.println("calidad: " + quality);
                    double realmq = Double.parseDouble(scan.next());
                        realmQ[index] = realmq;
                    System.out.println("realmQ: " + realmq);
                    double cost = Double.parseDouble(scan.next().replace("$", "").replace(" ", ""));
                        coste[index] = cost;
                    System.out.println("coste: $" + cost);

                    index++;                    
                }
                } catch(ArrayIndexOutOfBoundsException e){}    
        }     
}

   public static void main(String[] args) throws FileNotFoundException
         {
             ImportData("caca.csv");             
         }
}

Notes

This code posted is the one that works with a single .csv and that means you need to input this and the code should "split" the data too make it easy to work with.

2018/12/29
name,quantity,quality,realmQ,cost
Tejido,321 908,13.55,43.18,$15.98,
Ropa,195 045,20.55,45.93,$123.01,
Gorra de visera,126 561,17.43,42.32,$79.54,
Cerveza,80 109,3.37,17.93,$12.38,
Mercancías de playa,75 065,11.48,39.73,$105.93,
Bebidas alcohólicas,31 215,4.84,27.90,$32.29,
Artículos de cuero,19 098,23.13,44.09,$198.74,
Bolsas y carteras,7 754,23.09,41.34,$1 176.54,

What I expected

Was that if I add more .csv data below the previous one (appended), I want it to read it, no matter how big is the .csv

Thanks for the interest in this question.

What do you mean by “I want to make it dynamic”? Likewise. what do you mean by “make it bigger”? — Basil Bourque, Jan 04 '19 at 21:46
By dynamism I mean that I would like the arrays to be build so it stores no more or less data that they have to. I would like some ideas, for now I have to manually change it or set a huge array. — Jonalcaide, Jan 04 '19 at 21:49
What is “huge array”? Specify numbers, as “huge” means different things to different people in different scenarios with different hardware. Also, post your clarifications as edits to your Question, rather than as Comments. Your readers should not have to go spelunking through Comments to figure out your meaning. — Basil Bourque, Jan 04 '19 at 21:58
I have to parse data from a csvFile, right now I am working with small numbers, 10 products but if I work with, let's say, a 100 or even a thousand or any other number I want the program to make a dynamic array, so I do not have to manually change the array dimension manually every time I want to input data. — Jonalcaide, Jan 04 '19 at 22:03
I am still learning Java :) I will give it a check once i'm done with this. Thanks! — Jonalcaide, Jan 04 '19 at 22:09
Why is your `quantity` a pair of numbers separated by a SPACE character? — Basil Bourque, Jan 04 '19 at 22:17
It is like that when I scrap it from a website. I remove the SPACE in order to make it an integer. — Jonalcaide, Jan 04 '19 at 22:53
You should do all that input-processing and cleanup *before* working with and storing your data. — Basil Bourque, Jan 04 '19 at 22:55

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

CSV ➙ flat table

The CSV format was invented to represent a single simple flat table of data. Ditto for Tab-delimited files.

You have a hierarchy of a date mapping to a collection of name-quantity-quality-realmQ-cost tuples. That is not simple flat tabular data.

Flatten your data

If you want to store that in CSV, you must flatten by adding a column for the date and repeating the date value across the collection of tuples, to become date-name-quantity-quality-realmQ-cost tuples.

date,name,quantity,quality,realmQ,cost
2018-12-29,Tejido,321 908,13.55,43.18,$15.98
2018-12-29,Ropa,195 045,20.55,45.93,$123.01
2018-12-29,Gorra de visera,126 561,17.43,42.32,$79.54
2018-12-29,Cerveza,80 109,3.37,17.93,$12.38
2018-12-29,Mercancías de playa,75 065,11.48,39.73,$105.93
2018-12-29,Bebidas alcohólicas,31 215,4.84,27.90,$32.29
2018-12-29,Artículos de cuero,19 098,23.13,44.09,$198.74
2018-12-29,Bolsas y carteras,7 754,23.09,41.34,$1 176.54

That data could now be read and written to CSV files.

And watch your delimiters. Notice there should be no comma after the last field of each row.

Apache Commons CSV

The Apache Commons CSV library will perform the CSV parsing, reading, and writing for you. It has worked well for me a few times.

Cleaned data

Let’s parse a data.csv file with this content, with a flattened version of your example data. The data has been cleaned up:

Switched dates to standard ISO 8601 format
Eliminated SPACE character in integer numbers
Removed $ character
Deleted the extra comma at end of each row
Translated the product names to English (for this English edition of Stack Overflow).

date,name,quantity,quality,realmQ,cost
2018-12-29,Fabric,321908,13.55,43.18,15.98
2018-12-29,Clothing,195045,20.55,45.93,123.01
2018-12-29,Visor Cap,126561,17.43,42.32,79.54
2018-12-29,Beer,80109,3.37,17.93,12.38
2018-12-29,Beach goods,75065,11.48,39.73,105.93
2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
2018-12-29,Leather goods,19098,23.13,44.09,198.74
2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54
2018-12-30,Fabric,252229,12.86,43.14,18.87
2018-12-30,Clothing,132392,18.09,46.02,177.58
2018-12-30,Visor Cap,87676,14.42,42.46,122.48
2018-12-30,Beer,44593,2.72,17.79,18.71
2018-12-30,Beach goods,44593,8.26,39.56,200.78
2018-12-30,Alcoholic beverages,27306,4.30,23.88,31.95
2018-12-30,Leather goods,16147,21.08,43.91,207.49
2018-12-30,Bags and wallets,6552,21.11,40.59,1195.41
2019-01-02,Fabric,321908,13.55,43.18,15.98
2019-01-02,Clothing,195045,20.55,45.93,123.01
2019-01-02,Visor Cap,126561,17.43,42.32,79.54
2019-01-02,Beer,80109,3.37,17.93,12.38
2019-01-02,Beach goods,75065,11.48,39.73,105.93
2019-01-02,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-02,Leather goods,19098,23.13,44.09,198.74
2019-01-02,Bags and wallets,7754,23.09,41.34,1176.54
2019-01-03,Fabric,321908,13.55,43.18,15.98
2019-01-03,Clothing,195045,20.55,45.93,123.01
2019-01-03,Visor Cap,126561,17.43,42.32,79.54
2019-01-03,Beer,80109,3.37,17.93,12.38
2019-01-03,Beach goods,75065,11.48,39.73,105.93
2019-01-03,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-03,Leather goods,19098,23.13,44.09,198.74
2019-01-03,Bags and wallets,7754,23.09,41.34,1176.54

We define a class to hold each tuple.

package com.basilbourque.example;

import java.math.BigDecimal;
import java.time.LocalDate;
import java.util.Objects;

public class DailyProduct {
    // date,name,quantity,quality,realmQ,cost
    // 2018-12-29,Fabric,321908,13.55,43.18,15.98
    // 2018-12-29,Clothing,195045,20.55,45.93,123.01
    // 2018-12-29,Visor Cap,126561,17.43,42.32,79.54
    // 2018-12-29,Beer,80109,3.37,17.93,12.38
    // 2018-12-29,Beach goods,75065,11.48,39.73,105.93
    // 2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
    // 2018-12-29,Leather goods,19098,23.13,44.09,198.74
    // 2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54

    public enum Header {
        DATE, NAME, QUANTITY, QUALITY, REALMQ, COST;
    }

    // ----------|  Member vars  |-----------------------------------
    public LocalDate localDate;
    public String name;
    public Integer quantity;
    public BigDecimal quality, realmQ, cost;

    // ----------|  Constructor  |-----------------------------------
    public DailyProduct ( LocalDate localDate , String name , Integer quantity , BigDecimal quality , BigDecimal realmq , BigDecimal cost ) {
        this.localDate = Objects.requireNonNull( localDate );
        this.name = Objects.requireNonNull( name );
        this.quantity = Objects.requireNonNull( quantity );
        this.quality = Objects.requireNonNull( quality );
        this.realmQ = Objects.requireNonNull( realmq );
        this.cost = Objects.requireNonNull( cost );
    }

    // ----------|  `Object` overrides  |-----------------------------------
    @Override
    public String toString ( ) {
        return "com.basilbourque.example.DailyProduct{ " +
                "localDate=" + localDate +
                " | name='" + name + '\'' +
                " | quantity=" + quantity +
                " | quality=" + quality +
                " | realmq=" + realmQ +
                " | cost=" + cost +
                " }";
    }

    @Override
    public boolean equals ( Object o ) {
        if ( this == o ) return true;
        if ( o == null || getClass() != o.getClass() ) return false;
        DailyProduct that = ( DailyProduct ) o;
        return localDate.equals( that.localDate ) &&
                name.equals( that.name );
    }

    @Override
    public int hashCode ( ) {
        return Objects.hash( localDate , name );
    }

}

Write a class to read and write files containing the data of the DailyProduct objects.

package com.basilbourque.example;

import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;
import java.io.IOException;
import java.math.BigDecimal;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Instant;
import java.time.LocalDate;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;

public class DailyProductFileHandler {
    public List < DailyProduct > read ( Path path ) {
        // TODO: Add a check for valid file existing.

        List < DailyProduct > list = List.of();  // Default to empty list.
        try {
            // Prepare list.
            int initialCapacity = ( int ) Files.lines( path ).count();
            list = new ArrayList <>( initialCapacity );

            // Read CSV file. For each row, instantiate and collect `DailyProduct`.
            BufferedReader reader = Files.newBufferedReader( path );
            Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
            for ( CSVRecord record : records ) {
                // date,name,quantity,quality,realmQ,cost
                LocalDate localDate = LocalDate.parse( record.get( "date" ) );
                String name = record.get( "name" );
                Integer quantity = Integer.valueOf( record.get( "quantity" ) );
                BigDecimal quality = new BigDecimal( record.get( "quality" ) );
                BigDecimal realmQ = new BigDecimal( record.get( "realmQ" ) );  // Note: case-sensitive.
                BigDecimal cost = new BigDecimal( record.get( "cost" ) );
                // Instantiate `DailyProduct` object, and collect it.
                DailyProduct dailyProduct = new DailyProduct( localDate , name , quantity , quality , realmQ , cost );
                list.add( dailyProduct );
            }
        } catch ( IOException e ) {
            e.printStackTrace();
        }
        return list;
    }

    public void write ( final List < DailyProduct > dailyProducts , final Path path ) {
        try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "date" , "name" , "quantity" , "quality" , "realmQ" , "cost" ).print( path , StandardCharsets.UTF_8 ) ; ) {
            for ( DailyProduct dp : dailyProducts ) {
                printer.printRecord( dp.localDate , dp.name , dp.quantity , dp.quality , dp.realmQ , dp.cost );
            }
        } catch ( IOException e ) {
            e.printStackTrace();
        }
    }

    public static void main ( final String[] args ) {
        DailyProductFileHandler fileHandler = new DailyProductFileHandler();

        Path pathInput = Paths.get( "/Users/basilbourque/data.csv" );
        List < DailyProduct > list = fileHandler.read( pathInput );
        System.out.println( list );

        String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
        Path pathOutput = Paths.get( "/Users/basilbourque/data_" + when + ".csv" );
        fileHandler.write( list , pathOutput );
        System.out.println( "Writing file: " + pathOutput );
    }
}

When run:

[com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Fabric' | quantity=252229 | quality=12.86 | realmq=43.14 | cost=18.87 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Clothing' | quantity=132392 | quality=18.09 | realmq=46.02 | cost=177.58 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Visor Cap' | quantity=87676 | quality=14.42 | realmq=42.46 | cost=122.48 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Beer' | quantity=44593 | quality=2.72 | realmq=17.79 | cost=18.71 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Beach goods' | quantity=44593 | quality=8.26 | realmq=39.56 | cost=200.78 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Alcoholic beverages' | quantity=27306 | quality=4.30 | realmq=23.88 | cost=31.95 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Leather goods' | quantity=16147 | quality=21.08 | realmq=43.91 | cost=207.49 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Bags and wallets' | quantity=6552 | quality=21.11 | realmq=40.59 | cost=1195.41 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }]

Writing file: /Users/basilbourque/data_2019-01-05T03•48•37Z.csv

ISO 8601

By the way, when serializing date-time values to text, always use the standard ISO 8601 formats. For a date-only value without time-of-day and without time zone, that would be YYYY-MM-DD.

XML & JSON

If you want to preserve the hierarchy, use some file format other than CSV. Commonly XML or JSON is used for such data.

Database

Your Question does not provide enough detail to know for certain, but I get the feeling you should be using a database rather than text files. If you are reading, editing, and appending new data, for large amounts of data (large meaning enough to be concerned about impacting memory limits) or you are using multiple processes/threads/users, then a database is called for. A database is designed to efficiently handle data too large to fit entirely into memory. And a database is designed to handle concurrent access.

Data in memory

I have to parse data from a csvFile, right now I am working with small numbers, 10 products but if I work with, let's say, a 100 or even a thousand

That is not “large” as you put it. Even a Raspberry Pi or Beaglebone Black has enough RAM to load several thousand of such tuples into memory.

Collections

or any other number I want the program to make a dynamic array, so I do not have to manually change the array dimension manually every time I want to input data.

You need to learn about Java Collections Framework, rather than using simple arrays.

In particular, your date-to-tuple hierarchy would commonly be represented by using a Map (also called a dictionary by some folks). This data structure is a collection of key-value pairs, where the date would be your key and a Set or List of your tuples would be your value.

Define a class for your tuple data, named something like Product. Add member variables: name, quantity, quality, realmq, and cost. Instantiate an object for each tuple.

Create a Map such as a TreeMap. Being a SortedMap it keeps your dates in chronological order.

SortedMap< Product > map = new TreeMap<>() ;

Use LocalDate for your date values, the key in your map.

LocalDate ld = LocalDate.of( 2018 , 1 , 23 ) ;
map.put( ld , new ArrayList< Product >() ) ; // Pass an initial capacity in those parens if you know a likely size of the list.

For each Product object, retrieve the list from the map for the relevant date, add the product to the list.

When serializing, use an XML or JSON framework to write the map to storage.

Or do so yourself, writing your own data format. Get all the keys from the map, loop them, writing each date to file. And for each date, extract its list from the map (each value for each key). Loop the Product objects in the list. Write out each product’s member variables. Use any field and row delimiters. Though not often used for reasons I have never understood, ASCII (a subset of Unicode) has specific delimiter characters. I suggest you use these separators. The code points:

31 for field (INFORMATION SEPARATOR ONE)
30 for row (INFORMATION SEPARATOR TWO)
29 for group (INFORMATION SEPARATOR THREE)
28 for file (INFORMATION SEPARATOR FOUR)

All of these issues have been addressed many times on Stack Overflow. Search to learn more.

Extraneous text

When serializing data, do not include extraneous text.

The $ in your cost column is just noise. If you meant to indicated a particular currency, a simple $ fails to do the job as it could be Canadian dollars, United States dollars, Mexican pesos, or perhaps other currencies. So use a standard currency symbol such as CAD & USD & MXN. If all the values are in a single known currency such as CAD, then omit the ‘$’ entirely.

Performance

Preface: If you are frequently moving data in and out of these files for updating, you should be using a database rather than text files.

No need to worry about performance of CSV versus XML versus JSON.

Firstly, you are falling into the evil trap of premature optimization (google/duckduckgo that phrase).

Secondly, you would have to have enormous amount of data frequently processed to have any performance difference be significant, far beyond that of common business apps. Accessing files of any format from storage, even from SSD drives, is so slow that it dwarfs time taken for the CPU-driven processing of the data.

Choose a format based on fitting the needs of your data and app.

For simple flat data, use CSV or Tab-delimited or the ASCII/Unicode codes for delimiting (codepoints 28-31).

For hierarchical data, use XML. XML has the advantage of being very precisely defined by specification. So much tooling has been built for XML. And XML Schema is also well-defined. This provides a powerful way to validate incoming data files before attempting to process.

As for JSON, use only if you must, and only for small amounts of relatively simple data. It lacks the well-defined specs and schema of XML. It is not intended to work well with deep hierarchies or vast collections. JSON only exists because it is convenient for JavaScript programmers, and because of the IT industry’s masochistic penchant for reinventing the wheel over and over again.

XML and JSON share one major advantage: binding. In the Java world, there are both standard and handy-but-non-standard frameworks for automatically serializing your Java object’s as XML or JSON text. Going the other direction, the frameworks can instantiate Java objects directly from your incoming XML/JSON. So you needn’t write code yourself to handle each field of data.

This binding feature is not worth the bother for the simple data shown in the Question. For that, CSV or Tab-delimited is appropriate, with Apache Commons CSV as shown in this Answer.

Hash

Tip: You should send a hash (MD5, SHA, etc) of each data file. Upon receiving the file and the hash, the receiving computer recalculates the hash of the incoming file. Then compare hash results to verify that the data file arrived without corruption in its data.

Thank you a lot for this huge amount of info. Already made some changes you mentioned. Although, I have a quick question, what if there is a comma after the last number, what does it change? — Jonalcaide, Jan 10 '19 at 12:13
Also, If I had to frequently go into the file where I store data in. Which format will be "faster" CSV, XML or JSON? Currently I know CSV and XML pretty well. — Jonalcaide, Jan 10 '19 at 12:16
@WhiteGlove A comma on the end violates the CSV format rules. A comma is a delimiter for fields, not rows. — Basil Bourque, Jan 10 '19 at 17:35

Florian Stendel · Answer 2 · 2019-01-03T21:54:53.670

Have you tried to catch not only an ArrayIndexOutOfBoundsException e, but all occuring exceptions(via an additional }catch(Excepction e) for instance) and print it out?

As far as i can see, you loop will break as soon, as the loop hits the date of the second block. (since your loop expects values and not the single date field).

An addition:

CSV are supposed to/may have one single Header-Line and not several ones.

I would considering removing the dates and the intermediate Header-declarations and using the date as part of the data (6th column), as this will most likely adhere to CSV-Files.

E.g. instead of this:

2018/12/29 name,quantity,quality,realmQ,cost Tejido,321 908,13.55,43.18,$15.98, Ropa,195 045,20.55,45.93,$123.01, 2018/12/30 name,quantity,quality,realmQ,cost Tejido,324 708,13.55,43.18,$17.98, Ropa,111 045,20.55,45.93,$14.01,

you could/should do:

name,quantity,quality,realmQ,cost,date Tejido,321 908,13.55,43.18,$15.98,2018/12/29 Ropa,195 045,20.55,45.93,$123.01,2018/12/29 Tejido,324 708,13.55,43.18,$17.98,2018/12/30 Ropa,111 045,20.55,45.93,$14.01,2018/12/30

and - of course - move the date fetching to the loop.

HTH

so change the format to something like: `date: prodName and the rest`? — Jonalcaide, Jan 03 '19 at 18:52
That works, sure I've learnt something new with all of you. Many thanks! — Jonalcaide, Jan 10 '19 at 22:34

score 0 · Answer 3 · answered Jan 05 '19 at 12:43

0

Make decisions on the basis of pattern matching to insert data in the array. 

Case: pattern match date, insert in your array and don’t increment the index, jump on to another line and add the field on the same index, continue like this. 

Regarding multiple header string match the pattern again and process accordingly.

Date pattern : \d{4}/\d{2}/\d{2}
Header Pattern : [\p{L},]+
Value Pattern : [\p{L}\p{N},.]+

HTH

answered Jan 05 '19 at 12:43

A.J.

165
1
6

I am not really into ReGex but sure I will have too for Linux and stuff. Thanks for the answer! – Jonalcaide Jan 10 '19 at 22:34