CSV ➙ flat table
The CSV format was invented to represent a single simple flat table of data. Ditto for Tab-delimited files.
You have a hierarchy of a date mapping to a collection of name-quantity-quality-realmQ-cost tuples. That is not simple flat tabular data.
Flatten your data
If you want to store that in CSV, you must flatten by adding a column for the date and repeating the date value across the collection of tuples, to become date-name-quantity-quality-realmQ-cost tuples.
date,name,quantity,quality,realmQ,cost
2018-12-29,Tejido,321 908,13.55,43.18,$15.98
2018-12-29,Ropa,195 045,20.55,45.93,$123.01
2018-12-29,Gorra de visera,126 561,17.43,42.32,$79.54
2018-12-29,Cerveza,80 109,3.37,17.93,$12.38
2018-12-29,Mercancías de playa,75 065,11.48,39.73,$105.93
2018-12-29,Bebidas alcohólicas,31 215,4.84,27.90,$32.29
2018-12-29,Artículos de cuero,19 098,23.13,44.09,$198.74
2018-12-29,Bolsas y carteras,7 754,23.09,41.34,$1 176.54
That data could now be read and written to CSV files.
And watch your delimiters. Notice there should be no comma after the last field of each row.
Apache Commons CSV
The Apache Commons CSV library will perform the CSV parsing, reading, and writing for you. It has worked well for me a few times.
Cleaned data
Let’s parse a data.csv
file with this content, with a flattened version of your example data. The data has been cleaned up:
- Switched dates to standard ISO 8601 format
- Eliminated SPACE character in integer numbers
- Removed
$
character
- Deleted the extra comma at end of each row
- Translated the product names to English (for this English edition of Stack Overflow).
date,name,quantity,quality,realmQ,cost
2018-12-29,Fabric,321908,13.55,43.18,15.98
2018-12-29,Clothing,195045,20.55,45.93,123.01
2018-12-29,Visor Cap,126561,17.43,42.32,79.54
2018-12-29,Beer,80109,3.37,17.93,12.38
2018-12-29,Beach goods,75065,11.48,39.73,105.93
2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
2018-12-29,Leather goods,19098,23.13,44.09,198.74
2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54
2018-12-30,Fabric,252229,12.86,43.14,18.87
2018-12-30,Clothing,132392,18.09,46.02,177.58
2018-12-30,Visor Cap,87676,14.42,42.46,122.48
2018-12-30,Beer,44593,2.72,17.79,18.71
2018-12-30,Beach goods,44593,8.26,39.56,200.78
2018-12-30,Alcoholic beverages,27306,4.30,23.88,31.95
2018-12-30,Leather goods,16147,21.08,43.91,207.49
2018-12-30,Bags and wallets,6552,21.11,40.59,1195.41
2019-01-02,Fabric,321908,13.55,43.18,15.98
2019-01-02,Clothing,195045,20.55,45.93,123.01
2019-01-02,Visor Cap,126561,17.43,42.32,79.54
2019-01-02,Beer,80109,3.37,17.93,12.38
2019-01-02,Beach goods,75065,11.48,39.73,105.93
2019-01-02,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-02,Leather goods,19098,23.13,44.09,198.74
2019-01-02,Bags and wallets,7754,23.09,41.34,1176.54
2019-01-03,Fabric,321908,13.55,43.18,15.98
2019-01-03,Clothing,195045,20.55,45.93,123.01
2019-01-03,Visor Cap,126561,17.43,42.32,79.54
2019-01-03,Beer,80109,3.37,17.93,12.38
2019-01-03,Beach goods,75065,11.48,39.73,105.93
2019-01-03,Alcoholic beverages,31215,4.84,27.90,32.29
2019-01-03,Leather goods,19098,23.13,44.09,198.74
2019-01-03,Bags and wallets,7754,23.09,41.34,1176.54
We define a class to hold each tuple.
package com.basilbourque.example;
import java.math.BigDecimal;
import java.time.LocalDate;
import java.util.Objects;
public class DailyProduct {
// date,name,quantity,quality,realmQ,cost
// 2018-12-29,Fabric,321908,13.55,43.18,15.98
// 2018-12-29,Clothing,195045,20.55,45.93,123.01
// 2018-12-29,Visor Cap,126561,17.43,42.32,79.54
// 2018-12-29,Beer,80109,3.37,17.93,12.38
// 2018-12-29,Beach goods,75065,11.48,39.73,105.93
// 2018-12-29,Alcoholic beverages,31215,4.84,27.90,32.29
// 2018-12-29,Leather goods,19098,23.13,44.09,198.74
// 2018-12-29,Bags and wallets,7754,23.09,41.34,1176.54
public enum Header {
DATE, NAME, QUANTITY, QUALITY, REALMQ, COST;
}
// ----------| Member vars |-----------------------------------
public LocalDate localDate;
public String name;
public Integer quantity;
public BigDecimal quality, realmQ, cost;
// ----------| Constructor |-----------------------------------
public DailyProduct ( LocalDate localDate , String name , Integer quantity , BigDecimal quality , BigDecimal realmq , BigDecimal cost ) {
this.localDate = Objects.requireNonNull( localDate );
this.name = Objects.requireNonNull( name );
this.quantity = Objects.requireNonNull( quantity );
this.quality = Objects.requireNonNull( quality );
this.realmQ = Objects.requireNonNull( realmq );
this.cost = Objects.requireNonNull( cost );
}
// ----------| `Object` overrides |-----------------------------------
@Override
public String toString ( ) {
return "com.basilbourque.example.DailyProduct{ " +
"localDate=" + localDate +
" | name='" + name + '\'' +
" | quantity=" + quantity +
" | quality=" + quality +
" | realmq=" + realmQ +
" | cost=" + cost +
" }";
}
@Override
public boolean equals ( Object o ) {
if ( this == o ) return true;
if ( o == null || getClass() != o.getClass() ) return false;
DailyProduct that = ( DailyProduct ) o;
return localDate.equals( that.localDate ) &&
name.equals( that.name );
}
@Override
public int hashCode ( ) {
return Objects.hash( localDate , name );
}
}
Write a class to read and write files containing the data of the DailyProduct
objects.
package com.basilbourque.example;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;
import java.io.BufferedReader;
import java.io.IOException;
import java.math.BigDecimal;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Instant;
import java.time.LocalDate;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
public class DailyProductFileHandler {
public List < DailyProduct > read ( Path path ) {
// TODO: Add a check for valid file existing.
List < DailyProduct > list = List.of(); // Default to empty list.
try {
// Prepare list.
int initialCapacity = ( int ) Files.lines( path ).count();
list = new ArrayList <>( initialCapacity );
// Read CSV file. For each row, instantiate and collect `DailyProduct`.
BufferedReader reader = Files.newBufferedReader( path );
Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );
for ( CSVRecord record : records ) {
// date,name,quantity,quality,realmQ,cost
LocalDate localDate = LocalDate.parse( record.get( "date" ) );
String name = record.get( "name" );
Integer quantity = Integer.valueOf( record.get( "quantity" ) );
BigDecimal quality = new BigDecimal( record.get( "quality" ) );
BigDecimal realmQ = new BigDecimal( record.get( "realmQ" ) ); // Note: case-sensitive.
BigDecimal cost = new BigDecimal( record.get( "cost" ) );
// Instantiate `DailyProduct` object, and collect it.
DailyProduct dailyProduct = new DailyProduct( localDate , name , quantity , quality , realmQ , cost );
list.add( dailyProduct );
}
} catch ( IOException e ) {
e.printStackTrace();
}
return list;
}
public void write ( final List < DailyProduct > dailyProducts , final Path path ) {
try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "date" , "name" , "quantity" , "quality" , "realmQ" , "cost" ).print( path , StandardCharsets.UTF_8 ) ; ) {
for ( DailyProduct dp : dailyProducts ) {
printer.printRecord( dp.localDate , dp.name , dp.quantity , dp.quality , dp.realmQ , dp.cost );
}
} catch ( IOException e ) {
e.printStackTrace();
}
}
public static void main ( final String[] args ) {
DailyProductFileHandler fileHandler = new DailyProductFileHandler();
Path pathInput = Paths.get( "/Users/basilbourque/data.csv" );
List < DailyProduct > list = fileHandler.read( pathInput );
System.out.println( list );
String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );
Path pathOutput = Paths.get( "/Users/basilbourque/data_" + when + ".csv" );
fileHandler.write( list , pathOutput );
System.out.println( "Writing file: " + pathOutput );
}
}
When run:
[com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-29 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Fabric' | quantity=252229 | quality=12.86 | realmq=43.14 | cost=18.87 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Clothing' | quantity=132392 | quality=18.09 | realmq=46.02 | cost=177.58 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Visor Cap' | quantity=87676 | quality=14.42 | realmq=42.46 | cost=122.48 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Beer' | quantity=44593 | quality=2.72 | realmq=17.79 | cost=18.71 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Beach goods' | quantity=44593 | quality=8.26 | realmq=39.56 | cost=200.78 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Alcoholic beverages' | quantity=27306 | quality=4.30 | realmq=23.88 | cost=31.95 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Leather goods' | quantity=16147 | quality=21.08 | realmq=43.91 | cost=207.49 }, com.basilbourque.example.DailyProduct{ localDate=2018-12-30 | name='Bags and wallets' | quantity=6552 | quality=21.11 | realmq=40.59 | cost=1195.41 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-02 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Fabric' | quantity=321908 | quality=13.55 | realmq=43.18 | cost=15.98 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Clothing' | quantity=195045 | quality=20.55 | realmq=45.93 | cost=123.01 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Visor Cap' | quantity=126561 | quality=17.43 | realmq=42.32 | cost=79.54 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Beer' | quantity=80109 | quality=3.37 | realmq=17.93 | cost=12.38 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Beach goods' | quantity=75065 | quality=11.48 | realmq=39.73 | cost=105.93 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Alcoholic beverages' | quantity=31215 | quality=4.84 | realmq=27.90 | cost=32.29 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Leather goods' | quantity=19098 | quality=23.13 | realmq=44.09 | cost=198.74 }, com.basilbourque.example.DailyProduct{ localDate=2019-01-03 | name='Bags and wallets' | quantity=7754 | quality=23.09 | realmq=41.34 | cost=1176.54 }]
Writing file: /Users/basilbourque/data_2019-01-05T03•48•37Z.csv
ISO 8601
By the way, when serializing date-time values to text, always use the standard ISO 8601 formats. For a date-only value without time-of-day and without time zone, that would be YYYY-MM-DD.
XML & JSON
If you want to preserve the hierarchy, use some file format other than CSV. Commonly XML or JSON is used for such data.
Database
Your Question does not provide enough detail to know for certain, but I get the feeling you should be using a database rather than text files. If you are reading, editing, and appending new data, for large amounts of data (large meaning enough to be concerned about impacting memory limits) or you are using multiple processes/threads/users, then a database is called for. A database is designed to efficiently handle data too large to fit entirely into memory. And a database is designed to handle concurrent access.
Data in memory
I have to parse data from a csvFile, right now I am working with small numbers, 10 products but if I work with, let's say, a 100 or even a thousand
That is not “large” as you put it. Even a Raspberry Pi or Beaglebone Black has enough RAM to load several thousand of such tuples into memory.
Collections
or any other number I want the program to make a dynamic array, so I do not have to manually change the array dimension manually every time I want to input data.
You need to learn about Java Collections Framework, rather than using simple arrays.
In particular, your date-to-tuple hierarchy would commonly be represented by using a Map
(also called a dictionary by some folks). This data structure is a collection of key-value pairs, where the date would be your key and a Set
or List
of your tuples would be your value.
Define a class for your tuple data, named something like Product
. Add member variables: name
, quantity
, quality
, realmq
, and cost
. Instantiate an object for each tuple.
Create a Map
such as a TreeMap
. Being a SortedMap
it keeps your dates in chronological order.
SortedMap< Product > map = new TreeMap<>() ;
Use LocalDate
for your date values, the key in your map.
LocalDate ld = LocalDate.of( 2018 , 1 , 23 ) ;
map.put( ld , new ArrayList< Product >() ) ; // Pass an initial capacity in those parens if you know a likely size of the list.
For each Product
object, retrieve the list from the map for the relevant date, add the product to the list.
When serializing, use an XML or JSON framework to write the map to storage.
Or do so yourself, writing your own data format. Get all the keys from the map, loop them, writing each date to file. And for each date, extract its list from the map (each value for each key). Loop the Product
objects in the list. Write out each product’s member variables. Use any field and row delimiters. Though not often used for reasons I have never understood, ASCII (a subset of Unicode) has specific delimiter characters. I suggest you use these separators. The code points:
- 31 for field (INFORMATION SEPARATOR ONE)
- 30 for row (INFORMATION SEPARATOR TWO)
- 29 for group (INFORMATION SEPARATOR THREE)
- 28 for file (INFORMATION SEPARATOR FOUR)
All of these issues have been addressed many times on Stack Overflow. Search to learn more.
Extraneous text
When serializing data, do not include extraneous text.
The $
in your cost
column is just noise. If you meant to indicated a particular currency, a simple $
fails to do the job as it could be Canadian dollars, United States dollars, Mexican pesos, or perhaps other currencies. So use a standard currency symbol such as CAD
& USD
& MXN
. If all the values are in a single known currency such as CAD, then omit the ‘$’ entirely.
Performance
Preface: If you are frequently moving data in and out of these files for updating, you should be using a database rather than text files.
No need to worry about performance of CSV versus XML versus JSON.
Firstly, you are falling into the evil trap of premature optimization (google/duckduckgo that phrase).
Secondly, you would have to have enormous amount of data frequently processed to have any performance difference be significant, far beyond that of common business apps. Accessing files of any format from storage, even from SSD drives, is so slow that it dwarfs time taken for the CPU-driven processing of the data.
Choose a format based on fitting the needs of your data and app.
For simple flat data, use CSV or Tab-delimited or the ASCII/Unicode codes for delimiting (codepoints 28-31).
For hierarchical data, use XML. XML has the advantage of being very precisely defined by specification. So much tooling has been built for XML. And XML Schema is also well-defined. This provides a powerful way to validate incoming data files before attempting to process.
As for JSON, use only if you must, and only for small amounts of relatively simple data. It lacks the well-defined specs and schema of XML. It is not intended to work well with deep hierarchies or vast collections. JSON only exists because it is convenient for JavaScript programmers, and because of the IT industry’s masochistic penchant for reinventing the wheel over and over again.
XML and JSON share one major advantage: binding. In the Java world, there are both standard and handy-but-non-standard frameworks for automatically serializing your Java object’s as XML or JSON text. Going the other direction, the frameworks can instantiate Java objects directly from your incoming XML/JSON. So you needn’t write code yourself to handle each field of data.
This binding feature is not worth the bother for the simple data shown in the Question. For that, CSV or Tab-delimited is appropriate, with Apache Commons CSV as shown in this Answer.
Hash
Tip: You should send a hash (MD5, SHA, etc) of each data file. Upon receiving the file and the hash, the receiving computer recalculates the hash of the incoming file. Then compare hash results to verify that the data file arrived without corruption in its data.