7

I have 2 textfiles in two different languages and they are aligned line by line. I.e. the first line in the textfile1 should be equals to the first line in textfile2, and so on and so forth.

Is there a way to read both file line-by-line simultaneously?

Below is a sample of how the files should look like, imagine the number of lines per file is around 1,000,000.

textfile1:

This is a the first line in English
This is a the 2nd line in English
This is a the third line in English

textfile2:

C'est la première ligne en Français
C'est la deuxième ligne en Français
C'est la troisième ligne en Français

desired output

This is a the first line in English\tC'est la première ligne en Français
This is a the 2nd line in English\tC'est la deuxième ligne en Français
This is a the third line in English\tC'est la troisième ligne en Français

Currently, i can use this but saving a few million lines in the RAM will kill my machine.

String english = "/home/path-to-file/english";
String french = "/home/path-to-file/french";
BufferedReader enBr = new BufferedReader(new FileReader(english));
BufferedReader frBr = new BufferedReader(new FileReader(french));

ArrayList<String> enFile = new ArrayList<String>();
while ((line = enBr.readLine()) != null) {
    enFile.add(line);
}

int index = 0;
while ((line = frBr.readLine()) != null) {
    String enSentence = enFile.get(index);
    System.out.println(line + "\t" + enSentence);
    index++;
}
alvas
  • 115,346
  • 109
  • 446
  • 738
  • 1
    Why not combine the two reads into a single while loop? – Ewald May 31 '12 at 09:39
  • 1
    I'd say that given two 1,000,000 line files that the chance that they're both EXACTLY aligned for all 1,000,000 lines is pretty slim. Your code is going to be brittle unless you can work around that fact. – Jeff Watkins May 31 '12 at 09:41
  • Do you have to print the lines only or also have to store them? – Logan May 31 '12 at 09:42
  • 2
    May be it could be useful for you one day, but, if you are working on an Unix system, consider using this command : `paste -d '\t' english french > englishandfrench` – Zakaria May 31 '12 at 09:43
  • i've to store them and most probably index them into a textfile immediately after reading each sentence from two files. – alvas May 31 '12 at 09:45
  • How many different ways are there to read in a text file. I have used scanner and the new BufferedReader(new FileReader(myFile)); but I am sure there are different ways. I found that even with the Scanner object there are at least three different ways. I would post them up but I am at work right now and do not have access to my computer. – Doug Hauf Jan 24 '14 at 17:51

2 Answers2

10

Put the calls to nextLine on both readers in the same loop:

String english = "/home/path-to-file/english";
String french = "/home/path-to-file/french";
BufferedReader enBr = new BufferedReader(new FileReader(english));
BufferedReader frBr = new BufferedReader(new FileReader(french));

while (true) {
    String partOne = enBr.readLine();
    String partTwo = frBr.readLine();

    if (partOne == null || partTwo == null)
        break;

    System.out.println(partOne + "\t" + partTwo);
}
THIS USER NEEDS HELP
  • 3,136
  • 4
  • 30
  • 55
aioobe
  • 413,195
  • 112
  • 811
  • 826
  • Thanks now it works, i think the `index` will be useful in counting the number of sentences. But i've used `if(...)continue;` instead of `break;` – alvas May 31 '12 at 09:53
2

This is how I would do it:

List<String> strings = new ArrayList<String>();
BufferedReader enBr = ...
BufferedReader frBr = ...

String english = "";
String french = "";
while (((english = enBr.readline()) != null) && ((french = frBr.readline) != null))
{
    strings.add(english + "\t" + french);
}
npinti
  • 51,780
  • 5
  • 72
  • 96
  • But if the french file contains more lines, those lines won't be part of the result. – Zakaria May 31 '12 at 09:45
  • @Zakaria: If that is true then I think that this statement does not hold: `I have 2 textfiles in two different languages and they are aligned line by line. I.e. the first line in the textfile1 should be equals to the first line in textfile2, and so on and so forth.` – npinti May 31 '12 at 09:53
  • =) this method works too, but the other method is more intuitive without the global `String english` , `String french`. This solution will be more apt if i have to compare to the previous sentnece to see whether it is the same. – alvas May 31 '12 at 09:54
  • @npinti : IMHO, the "should be" part has to be implemented by handling non-conform files (ex:different number of lines) :) – Zakaria May 31 '12 at 10:01
  • What if there are multiple files ? – plzdontkillme Jul 27 '14 at 00:24
  • @plzdontkillme: There should be 2 files. – npinti Jul 28 '14 at 04:54