7

I have a text file with 42 lines. Each line has more than 22,000 numbers separated by comma.

I wanted to extract certain numbers from each line, and I have an int array with a length of 1000 containing a 1,000 numbers that I need from each of those 42 lines.

For example if the array contains 43, 1 , 3244, it means that I want the 43th number, the 1st number and the 3244th numbers from each line, starting from the first line ending with the 42nd line.

My for loop does not seem to work, it reads only the first line from the text file that has the 42 lines of 220000 numbers, and I dont know where it goes wrong.

for(int i=0;i<42;i++){ //irretates through the 42 lines of 
    counter=1; // to keep track about on which line the code is working
    System.out.println("Starting line"+i);

    st2=new StringTokenizer(raf1.readLine(),",");
    //raf3 is a RandomAccessFile object containing the 42 lines

    a:while(st2.hasMoreTokens()){
        b=is_there(array2,counter);
        // is_there is a method that compares the rank of the taken being read with 
       //the whole array that has the 1000 numbers that i want. 
        if(b==false){ 
            // if the rank or order of token [e.g. 1st, 432th] we are stopping at 
           //is not among the 1000 numbers in the array 
            counter++;                  
            continue a;
        }
        else{ //if true
            s2=st2.nextToken();
            raf2.writeBytes(s2); //write that token on a new empty text file 
            raf2.writeBytes(","); // follow the number with a comma
            counter++;
        }
    } // end of for loop



public static boolean is_there(int[] x,int y){
    boolean b=false;
    for(int i=0;i<x.length;i++){
        if(x[i]==y) {b=true; break;}
    }
    return b;
Gaurav Dave
  • 6,838
  • 9
  • 25
  • 39
  • Can you please post the Output of the program? – Distjubo Apr 25 '15 at 12:42
  • Oh and btw you don't need counter, thats what i is for – Distjubo Apr 25 '15 at 12:43
  • Why do you use RandomAccessFile, when you're processing everything sequentially? My guess is, that you may read the whole thing as one line, due to different line ending conventions, and then you would get a NullPointerException when you're constructing the StringTokenizer. – ivant Apr 25 '15 at 12:58
  • 1
    You should certainly not build the number of lines into the code. You should just process the file until its end, which will be signalled by `readLine()` returning null. – user207421 Apr 25 '15 at 13:01
  • 1
    A pretty please: post well formated code, that is compilable and examples of inputs and outputs. I find your problem very hard to understand. – Fox Apr 25 '15 at 13:10
  • @Distjubo : The output is in a text file it's too huge cant copy and paste it all, i may only upload the text file if that was possible. And yeah you are totally right, i dont need the counter ! not so clean code! – Tagwa Warrag Apr 25 '15 at 21:15
  • @ivant I am just more familiar with RandomAccessFile......... And the thing is, the input file was actually a .csv file that i can open with excel. When i converted it into a text file and opened it with textpad, each row in the old excel sheet was on one line and the numbers were separated by comma. – Tagwa Warrag Apr 25 '15 at 21:20
  • @Fox the code is compiled ... the input and output are text files that contain huge data beyond being posted! the input text file has 43 rows. Each row has more than 22,000 int numbers separated by comma. – Tagwa Warrag Apr 25 '15 at 21:24
  • @TagwaWarrag have you tried to compile the code you pasted here? I bet you did not. counter is not defined, so is raf2, raf3, the code is mising closing brace and so on and so on. And yes, you can simplify your problem, to maybe 3 rows by 10 integers and 5 integers in an array, to be able to demonstrate what you expect the inputs and outputs to be. And no, that is not beyond posting. You can fit that into the code easily. – Fox Apr 25 '15 at 22:44
  • @amyassin What are you expecting from this bounty? – Radiodef Apr 26 '15 at 01:43
  • 1
    @Fox i DID COMPILE AND RUN THE CODE AND I HAVE ALREADY STATED THAT IT PRODUCES AN INCORRECT RESULT. Here I only pasted the for loop and the method i used to compare, not the whole code. – Tagwa Warrag Apr 26 '15 at 06:40
  • @Radiodef an answer! – amyassin Apr 26 '15 at 07:48
  • 1
    @amyassin I've undeleted my answer then. It received no comment and I was not sure what was going on when I noticed the bounty was by somebody besides the OP. – Radiodef Apr 26 '15 at 07:50
  • The bounty goes to the accepted answer, the OP decides.. – amyassin Apr 26 '15 at 08:05
  • @TagwaWarrag so as you state yourself. It is only a part of compilable code. And you did not try to compile this part alone. I have made an example from your code, so I know, it's possible. You would have answer sooner and better if you took my advice. But you seem to not care. Good luck. oh, and btw. yelling at me for trying to help you ... :-D – Fox Apr 26 '15 at 08:51
  • @Fox The OP posted a code well formatted and with as much explanation and comments as seemed appropriate.. You can go smoother with a single digit rep new arrival.. Not all people earned a Mortarboard badge before ;-) – amyassin Apr 26 '15 at 09:03
  • 1
    @amyassin is politely asking for working code and clear question too hard on newbie? Anyway, we're offtopic here. Good luck guys. Posting something like [this](http://pastebin.com/dMJzh8GC) would've probably landed the answer in a matter of minutes. – Fox Apr 26 '15 at 10:32
  • @Fox Here we go.. A nice tip.. Thank you :) – amyassin Apr 26 '15 at 10:35
  • 1
    @Tagwa Warrag why bouty is still on, if Radiodef solution si working? – m.cekiera Apr 28 '15 at 16:30

2 Answers2

7

One problem you have is that when you find an index that's not in your array, you don't actually skip the token:

if ( b == false ) {
    // don't actually skip the token !!
    counter++;                  
    continue a;
} else {
    s2 = st2.nextToken();
    raf2.writeBytes(s2);
    raf2.writeBytes(",");
    counter++;
}

This means your StringTokenizer gets 1 token behind every time you try to skip.

This could possibly result in an infinite loop for example.

if ( b == false ) {
    // so skip the token !!
    st2.nextToken();

    counter++;                  
    continue a;
} else {
    s2 = st2.nextToken();
    raf2.writeBytes(s2);
    raf2.writeBytes(",");
    counter++;
}

As a side note, the loop can be rewritten more elegantly as follows:

while (st2.hasMoreTokens()) {
    s2 = st2.nextToken();

    if (is_there(array2, counter)) {
        raf2.writeBytes(s2);
        raf2.writeBytes(",");
    }

    ++counter;
}

You should also:

Community
  • 1
  • 1
Radiodef
  • 37,180
  • 14
  • 90
  • 125
  • Thank you, i did nt realize that I dont skip the unneeded tokens....I added your line, but i stil dont get the output I want. I think my problem is with the logic I used for searcging through the input txt file and writing to the output file. – Tagwa Warrag Apr 30 '15 at 05:58
  • 1
    *"i stil dont get the output I want"* This isn't a description of a problem. – Radiodef May 03 '15 at 12:13
  • The foor loop selects the incorrect tokens from each line. – Tagwa Warrag May 05 '15 at 07:23
5

The Radiodef answer is correct, however I think there is still one piece missing. The code finds proper numbers, but prints them in one line, because there is no 'next Line' statement after the loop which go through particular line (at least not in the code above), for example like this:

        for(int i=0;i<42;i++){
        counter=1; // to keep track about on which TOKEN the code is working
        System.out.println("Starting line"+i);
        st2=new StringTokenizer(raf1.readLine(),",");
            while(st2.hasMoreTokens()){
                boolean b = is_there(array2,counter);
                if(!b){
                    st2.nextToken();
                }else{
                    String s2=st2.nextToken();
                    raf2.writeBytes(s2 + ",");
                }
                counter++;
            }
            raf2.writeBytes("\r\n");         //next line!
        }

This way, it should read, search and print numbers correctly.

What's more, there is a mistake in comments: counter=1; // to keep track about on which line the code is working. The counter keeps track on which token the loop is working on, not line.

BTW. the is_there method also could take a shorter form:

public static boolean is_there(int[] x,int y){
    for(int i : x){
        if (i == y) return true;
    }
    return false;
}

However, I am not sure, is it more readable.

m.cekiera
  • 5,365
  • 5
  • 21
  • 35
  • Thank you Exactly! my incorrect output was lined up in one single line, instead of 43 lines. Thank you again. – Tagwa Warrag Apr 30 '15 at 07:33
  • @m.cekiera You meant `if (i == y) return true;` in is_there for sure. – Tobias Liefke Apr 30 '15 at 13:21
  • @Radiodef you mistook another user for an author of post. Still, I agree with you on that point – m.cekiera May 03 '15 at 12:35
  • @m.cekiera You are right, it was an autocomplete fail. Thanks. – Radiodef May 03 '15 at 12:43
  • 1
    @TagwaWarrag Note that while this answer is a great guess at your intention, your question says absolutely nothing about new lines in the output. Even my answer was really a guess when I posted it. This lack of detail is why it takes a week and a bounty to get an answer to your question. See http://stackoverflow.com/help/how-to-ask. – Radiodef May 03 '15 at 12:43