1

I'm a java-beginner and want to learn how to read in files and store data in a way that makes it easy to manipulate.

I have a pretty big csv file (18000 rows). The data is representing the sortiment from all different beverages sold by a liqueur-shop. It consists of 16 something columns with headers like "article number, name, producer, amount of alcohol, etc etc. The columns are separated by "\t".

I now want to do some searching in this file to find things like how many products that are produced in Sweden and finding the most expensive liqueur/liter.

Since I really want to learn how to program and not just find the answer I'm not looking for any exact code here. I'm instead looking for the psuedo-code behind this and a good way of thinking when dealing with large sets of data and what kind of data structures that are best suited for a task.

Lets take the "How many products are from Sweden" example. Since the data consists of both strings, ints and floats I cant put everything in a list. What is the best way of storing it so it later could be manipulated? Or can I find it as soon as it's parsed, maybe I don't have to store it at all?

parvelmarv
  • 11
  • 1
  • 4
  • 1
    When you know your questions beforehand, you don't have to store all data. Just read the file line by line, split every line and count the column you want. But if you want to manipulate the data, then you need to store all lines. – IQV Feb 22 '18 at 14:08

3 Answers3

1

If you're new to Java and programming in general I'd recommend a library to help you view and use your data, without getting into databases and learning SQL. One that I've used in the past is Commons CSV.

https://commons.apache.org/proper/commons-csv/user-guide.html#Parsing_files

It lets you easily parse a whole CSV file into CSVRecord objects. For example:

Reader in = new FileReader("path/to/file.csv");
Iterable<CSVRecord> records = CSVFormat.EXCEL.parse(in);
for (CSVRecord record : records) {
String lastName = record.get("Last Name");
String firstName = record.get("First Name");
} 
Sam
  • 1,234
  • 3
  • 17
  • 32
  • This is very nice! Seems like a good way to start since I'm still new to this stuff. – parvelmarv Feb 23 '18 at 09:55
  • It's a great library. nice and straight forward. I'd recommend combining it with Majid's approach. Creating a POJO to store the fields and combining the records into a List. That way you can call the data easily using well-named methods like .getProducer() or .getAlcoholPercentage() – Sam Feb 23 '18 at 10:26
0

It seems you are looking for an in-memory SQL engine over your CSV file. I would suggest to use CQEngine which provides indexed view on top of Java collection framework with SQL-like queries.

You are basically treating Java collection as a database table. Assuming that each CSV line maps to some POJO class like Beverage:

IndexedCollection<Beverage> table = new ConcurrentIndexedCollection<Beverage>();
table.addIndex(NavigableIndex.onAttribute(Beverage.BEVERAGE_ID));
table.add(new Beverage(...));
table.add(new Beverage(...));
table.add(new Beverage(...));

What you need to do now is to read the CSV file and load it into IndexedCollection and then build a proper index on some fields. After that, you can query the table as a usual SQL database. At the end, de-serialize the collection to new CSV file (if you made any modification).

Majid Azimi
  • 5,575
  • 13
  • 64
  • 113
0

If you have csv file particularly then You may use database to store this data. You go through to read csv in java using this link.

Make use of ORM framework like Hibernate use alongwith Spring application. Use this link to create application

By using this you can create queries to fetch the data like "How many products are from Sweden" and make use of Collection framework. This link to use HQL queries in same application.

Create JSP pages to show the results on UI.

Sorry for my english.