0

Im starting to design a program that will automate the process of finding and identifying strings correctly based on similar strings and their identities that have been found and saved into a master CSV/Excel file.

Right now I want to design it properly so I dont run into issues later when implementing the CSV/Excel read writing part.

I will probably use OpenCSV to write and read the files, so my question is more about how I can edit the file.

Last time I dealt with editing CSV files I had to rewrite each line to a new or existing file rather than just editing a specific line. Is this the only way to do this?

Ex - if my csv is something like

1,2,3  and i wanted to change   1,2,3
4,5,6   4,5,6 to a,b,c          a,b,c
7,8,9                           7,8,9

The only way would be to read each line, change it if needed, then write it out again? There's no way to just edit the middle line?

The reason I ask this, is because I plan on doing a lot of custom user changes via GUI and writing the changes to a file every time would probably be very bad?

I think saving each line or cell in an array and editing the array would be a more efficient solution.

Any tricks or advice you could offer when editing CSV files?

Side note: I will probably be doing this in Java, as I am most familar with building GUI's with Swing, but I am open to trying it out in another language.

Eric G
  • 928
  • 1
  • 9
  • 29

1 Answers1

1

First off break the problem up into its components as you are overcomplicating it.

The root of the problem is that you have a file with records that you are writing a gui for to allow the user to edit.

In an effort to increase performance you want to read and write to the same file, attempting to only read or write a single record.

The file in question is in a csv format.

So the first one you have down cold so there is no need to go over that.

The second part I would say do not do with many exclamation points. The reason for that is the worst case scenario - you program crashes. At which point you have corrupted your original. If you know the number of records are small then read the whole thing into memory (say as an list of strings) and parse the individual strings into their records and when the user is done and they go to save you write it into a different file that once done you delete the original and rename the second file to the first. This way if you hit the worst case scenario you either have the original file intact or the changes are there just under a different name.

If there is too much to fit in memory at one time there is the RandomAccessFile that allows reads and writes to the same file. But I would recommend you make a copy of the file at the start (using the .tmp or .swp that some editors use) and work with that as it still protects you from the dreaded crash.

After that it is how you deal with CSV data. If it is simple text you can use the Java String split method. If it is more complex then openCSV has the CSVParser method that will parse the String into an array of strings for you. There is also a CSVParserBuilder that simplifies constructing the parser.

Hope that helps.

:)

Scott Conway
  • 975
  • 7
  • 13
  • Thanks. As for size, I wouldnt think it would be more than 1000 strings, so I guess as the master grows to have multiple columns for each original column, it could grow to 10,000 to 15,000 strings. The string length would be around 20-40 characters. Would this take a lot of memory? – Eric G Oct 26 '15 at 13:07
  • Just doing the math you will end up with 600K characters (15,000 * 40) and given that a character in Java is two bytes you are looking at 1.2M of memory just for the csv data. Is that alot? It depends. If you are going to run this on older hardware or some heavily loaded system then it might be but most systems today a couple of meg of memory is a pittance. You know more than I do about what your system constraints will be. – Scott Conway Oct 26 '15 at 16:11
  • I like the fact you are willing to concede a 15x growth. If you do put everything in memory you should consider checking the file length at the start and if it is more than your 1.2M give the user an error. That way if/when you are faced with fixing that issue you can decide if the size can be increased or it you need to architect a different solution. You could also make the file limit configurable in a property file so it can be changed quickly if the hardware does have plenty of memory. – Scott Conway Oct 26 '15 at 16:15