1

Recently I have been using alot of text files (csv) with 10-60k lines, something like this

id1,id2  
id3,id1  
id81,id13  
...

And most of the times, I need to extract this informaton in form of an array:

id1,id2,id3,id1,id81,id13

Or at times, unique elements array:

id1,id2,id3,id81

Then the result is used by my code (java) to do something.

Now, most of the times I write a java function which does the task for me, right from file reading, logic and then returning back the list of Ids.

Is there is a better and a quicker way to achieve this, maybe via command line?

Update:

If I was asked to build an app which was supposed to read a file and do something with it, I will surely write that logic in Java, but in my case I have to go through alot of text files which I get from the data warehouse, extract relevant info from it and then run it over my java based app.

Now, this is only for my experiment and evaluation of my app.

zengr
  • 38,346
  • 37
  • 130
  • 192
  • 1
    What do you need this array to be consumed by? Why would a command line utility be better? It would still need to read the file off the disk, parse the file, and store the array in memory as with a non-command line block of code. – Frazell Thomas Oct 03 '11 at 21:57
  • because I keep doing these small tests very frequently, not really part of an application. But to do experiments over the data. – zengr Oct 03 '11 at 21:58

2 Answers2

1

I copied your input in a file, test.csv:

$ cat test.csv 
id1,id2  
id3,id1  
id81,id13  

Now, with the 'tr' utility, you can do:

$ cat test.csv | tr '\n' ',' | tr -d ' '

and you have:

id1,id2,id3,id1,id81,id13
Savino Sguera
  • 3,522
  • 21
  • 20
0

Unless your Java code is doing something silly, it will be in the same speed ballpark as anything else.

There's nothing magic about command-line tools that will make them faster than your code.

RichieHindle
  • 272,464
  • 47
  • 358
  • 399