-1

Basically I have a file which consists of frequent Norwegian words. Words are stored in the following pattern.

1.  jeg 782578
2.  det 742951
3.  er 718645
4.  du 623395
5.  ikke 436196

From left to right: the first one is the list number, in the middle are the words I would like to extract and save, followed by the number count. I want to extract the words in a document without the number list or the number count. Of course I can do this manually but the list is huge and it contains 5000 words. So I am looking for an efficient way to do this using Java.

Basically I can only read and write to a file in Java. So if you have any idea to accomplish this task, I would be grateful if you could share it with me.

Zip
  • 5,372
  • 9
  • 28
  • 39

4 Answers4

1

The trick to doing something like this efficiently is to realize that you don't need to read the entire file into memory in order to manipulate it. You can create a loop which reads one line of input at a time and does whatever work is required to create one line of output:

    Scanner scanner = new Scanner(new File("input.txt"));
    PrintWriter writer = new PrintWriter("output.txt");

    while (scanner.hasNextLine()) {
        String line = scanner.nextLine();     // read a line from the input file
        writer.println(line.split(" ")[1]);   // write a line to the output file
    }

    scanner.close();
    writer.close();

This will allow you make line-by-line modifications on a file of any size.

azurefrog
  • 10,785
  • 7
  • 42
  • 56
0

Well try something like this.

BufferedReader br = new BufferedReader(new InputStreamReader(
                        new FileInputStream("pathOfYourTextFile/textFile.txt")));
PrintStream out = null;
while(true)
  {
   String line = br.readLine();
   if(line PrintStream out== null)
       break;
   else {
       String newline = line.replaceAll("-?\\d+.","").trim();
       out = new PrintStream(new FileOutputStream("outputFile.txt"));
       out.println(newline);
     }
   } 
 out.close();

Atleast try something before you ask something

SparkOn
  • 8,806
  • 4
  • 29
  • 34
-1

Though you know programming don't always try to apply that knowlesge to every context. You can simply replace all spaces with comma and save it as a .csv file. open it from excell or some spread sheet application. Delete the columns that you don't want. Save it back

-1

(sorry cannot comment yet)

5000 is not that big. I assume you are on windows.

You can use an editor like Notepad++ to search and replace using regular expression (you would use regular expression in java too). (here is one of the tutorials for Notepad++: http://markantoniou.blogspot.ca/2008/06/notepad-how-to-use-regular-expressions.html )

Or as "Thusitha Thilina Dayaratn" suggested to import file to a Excel type of program. During the import, You just specify that the data is space/tab separated.

Jama Djafarov
  • 358
  • 3
  • 11
  • That works for a one time solution, but becomes more difficult if this is part of some automated process. –  Aug 07 '14 at 16:28
  • it's most efficient way. Unless you do it everyday. And it's fast. It much better approach than to learn java. And how difficult is to open a file in Excel and save it? You can probably use macros to automate the process. (example is here: http://stackoverflow.com/questions/2050505/way-to-run-excel-macros-from-command-line-or-batch-file) – Jama Djafarov Aug 07 '14 at 16:39
  • `I am looking for an efficient way to do this using Java.` would lead me to assume their task requires them to use Java. – MxLDevs Aug 07 '14 at 19:59