0

Team, i have to parse file line by line and in single line i have split by ",". First String would be Name and Second would be count. Finaly i have to display the Key and Count For example

Peter,2 
Smith,3
Peter,3
Smith,5

I should display as Peter 5 and Smith 8.

So i was in confusion to choose between BufferedReader vs Scanner. Went through link . I came up with these two approach. i would like to get your concerns.

Approach 1 : use buffered Reader.

private HashMap<String, MutableLong> readFile(File file) throws IOException {
        final HashMap<String, MutableLong> keyHolder = new HashMap<>();
        try (BufferedReader br = new BufferedReader(new InputStreamReader(
                new FileInputStream(file), "UTF-8"))) {
            for (String line; (line = br.readLine()) != null;) {
                // processing the line.
                final String[] keyContents = line
                        .split(KeyCountExam.COMMA_DELIMETER);
                if (keyContents.length == 2) {
                    final String keyName = keyContents[0];
                    final long count = Long.parseLong(keyContents[1]);
                    final MutableLong keyCount = keyHolder.get(keyName);
                    if (keyCount != null) {
                        keyCount.add(count);
                        keyHolder.put(keyName, keyCount);
                    } else {
                        keyHolder.put(keyName, new MutableLong(count));
                    }
                }

            }
        }
        return keyHolder;
    }

private static final String COMMA_DELIMETER = ",";
    private static volatile Pattern commaPattern = Pattern
            .compile(COMMA_DELIMETER);

I have used MutableLong ( , since i dont want to create BigInteger for each time . And again it may be very big file and i don't have control on how max a key can occur

Another Approach :

use Scanner and use two Delimiter

private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r\u2028\u2029\u0085]";
    private static final String LINE_PATTERN = ".*(" + LINE_SEPARATOR_PATTERN
            + ")|.+$";
    private static volatile Pattern linePattern = Pattern.compile(LINE_PATTERN);

My Question is . i have went through the hasNext in Scanner and to me there is no harm to switch the Pattern . And i belive from Java 7, Scanner do has limited buffer can be enough for this kind of file.

Do any one perfer Approach 2 over Approach 1 or do we have any other option other than this. I just did sop for testing purpose. Obviously the same code in approach 1 would replace here. Using split in Approach1 would create multiple String instances. which can be avoided here ( am i right) , by scanning char sequence.

private HashMap<String, BigInteger> readFileScanner(File file)
            throws IOException {
        final HashMap<String, BigInteger> keyHolder = new HashMap<>();
        try (Scanner br = new Scanner(file, "UTF-8")) {
            while (br.hasNext()) {
                br.useDelimiter(commaPattern);
                System.out.println(br.next());
                System.out.println(br.next());
                br.useDelimiter(linePattern);
            }
        }
        return keyHolder;
    }
Community
  • 1
  • 1
Mani
  • 3,274
  • 2
  • 17
  • 27
  • Approach 3: use OpenCSV? – fge Apr 06 '14 at 16:00
  • I cant use other api, Actually i just recreated MutableLong in java – Mani Apr 06 '14 at 16:01
  • And why can't you use another API? – fge Apr 06 '14 at 16:03
  • Note that since you use Java 7, you should create your `BufferedReader` using `Files.newBufferedReader()` – fge Apr 06 '14 at 16:04
  • Thank you , i will use Files.newBufferedReader. Any other mistakes / can be done other way – Mani Apr 06 '14 at 16:07
  • Well, instead of `MutableLong` you could reuse `AtomicLong` since you can increment its value ;) – fge Apr 06 '14 at 16:09
  • yes. i did used. then i thought it would not necessary since it is not just provide mutable it also has aditional function which is overhead in this case. and to me MutableLong version in apache more apt. since just provides the mutable – Mani Apr 06 '14 at 16:12
  • The overhead of `AtomicLong` is negligible, no reason to go through such complicated ways ;) And didn't you say that you didn't want to use external libraries? So why Apache's MutableLong? – fge Apr 06 '14 at 16:15
  • As i said. i did created a class . wrapped by long. and named MutableLong , so that in this forum people would easily understood – Mani Apr 06 '14 at 16:17
  • OK but again, why? `AtomicLong` fits the bill -- and no, its overhead is not a cause for concern – fge Apr 06 '14 at 16:20
  • @fge Apart from MutableLong vs AtomicLong. Do you prefer Aprroach1 or Aproach 2 ? – Mani Apr 06 '14 at 16:20
  • If this is line oriented, don't bother with a scanner, go with the `BufferedReader`. Less complicated! – fge Apr 06 '14 at 16:22
  • Ok thanks. my favour path is #1, only concerns to me is newLine would return String and again split would create 2 String. totaly for a line it would create 3 String . that why i tried #2. but not conviced so came here to get opinion – Mani Apr 06 '14 at 16:25

0 Answers0