1

I am not familiar with JAVA NIO APIs. I need help to get the answer of commonly asked interview questions. If there is file which contains 50 gb data, what is most efficient way that we can read data from file and find most frequent word.

BufferedReader.readLine() is better API than scanner . do we have any other way also apart from creating multiple threads to read this file in batches using BufferedReader.readLine() API ?

Rama Sharma
  • 96
  • 11

2 Answers2

1

See java.nio.channels.FileChannel javadocs:

A region of a file may be mapped directly into memory; for large files this is often much more efficient than invoking the usual read or write methods.

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
0

Perhaps, using the below class, you may achieve fastest way of taking/reading input:

 import java.io.DataInputStream; 
 import java.io.FileInputStream; 
 import java.io.IOException; 
 import java.io.InputStreamReader; 
 import java.util.Scanner; 
 import java.util.StringTokenizer; 

public class Main 
{ 
static class Reader 
{ 
    final private int BUFFER_SIZE = 1 << 16; 
    private DataInputStream din; 
    private byte[] buffer; 
    private int bufferPointer, bytesRead; 

    public Reader() 
    { 
        din = new DataInputStream(System.in); 
        buffer = new byte[BUFFER_SIZE]; 
        bufferPointer = bytesRead = 0; 
    } 

    public Reader(String file_name) throws IOException 
    { 
        din = new DataInputStream(new FileInputStream(file_name)); 
        buffer = new byte[BUFFER_SIZE]; 
        bufferPointer = bytesRead = 0; 
    } 

    public String readLine() throws IOException 
    { 
        byte[] buf = new byte[64]; // line length 
        int cnt = 0, c; 
        while ((c = read()) != -1) 
        { 
            if (c == '\n') 
                break; 
            buf[cnt++] = (byte) c; 
        } 
        return new String(buf, 0, cnt); 
    } 

    public int nextInt() throws IOException 
    { 
        int ret = 0; 
        byte c = read(); 
        while (c <= ' ') 
            c = read(); 
        boolean neg = (c == '-'); 
        if (neg) 
            c = read(); 
        do
        { 
            ret = ret * 10 + c - '0'; 
        } while ((c = read()) >= '0' && c <= '9'); 

        if (neg) 
            return -ret; 
        return ret; 
    } 

    public long nextLong() throws IOException 
    { 
        long ret = 0; 
        byte c = read(); 
        while (c <= ' ') 
            c = read(); 
        boolean neg = (c == '-'); 
        if (neg) 
            c = read(); 
        do { 
            ret = ret * 10 + c - '0'; 
        } 
        while ((c = read()) >= '0' && c <= '9'); 
        if (neg) 
            return -ret; 
        return ret; 
    } 

    public double nextDouble() throws IOException 
    { 
        double ret = 0, div = 1; 
        byte c = read(); 
        while (c <= ' ') 
            c = read(); 
        boolean neg = (c == '-'); 
        if (neg) 
            c = read(); 

        do { 
            ret = ret * 10 + c - '0'; 
        } 
        while ((c = read()) >= '0' && c <= '9'); 

        if (c == '.') 
        { 
            while ((c = read()) >= '0' && c <= '9') 
            { 
                ret += (c - '0') / (div *= 10); 
            } 
        } 

        if (neg) 
            return -ret; 
        return ret; 
    } 

    private void fillBuffer() throws IOException 
    { 
        bytesRead = din.read(buffer, bufferPointer = 0, BUFFER_SIZE); 
        if (bytesRead == -1) 
            buffer[0] = -1; 
    } 

    private byte read() throws IOException 
    { 
        if (bufferPointer == bytesRead) 
            fillBuffer(); 
        return buffer[bufferPointer++]; 
    } 

    public void close() throws IOException 
    { 
        if (din == null) 
            return; 
        din.close(); 
    } 
} 

public static void main(String[] args) throws IOException 
{ 
    Reader s=new Reader(); 
    int n = s.nextInt(); 
    int k = s.nextInt(); 
    int count=0; 
    while (n-- > 0) 
    { 
        int x = s.nextInt(); 
        if (x%k == 0) 
        count++; 
    } 
    System.out.println(count); 
} 
} 
Abhinav
  • 530
  • 8
  • 21
  • Thanks for sharing above code to explain the scenario clearly . I have few doubts. As per my knowledge , din = new DataInputStream(new FileInputStream(file_name)); Above line of code is using API DataInputStream to read the file but this also requires data to be read into String variable which is of course costly operation. Is there any other way too to improve it ? – Rama Sharma Oct 24 '18 at 09:44
  • Basically, here we aren't reading any kind of variable according to their type. We are just converting the ASCII values of each letter whether it's int, long, String or any other kind of variable. Therefore, there is no need of explicitly storing the variable as String! – Abhinav Oct 24 '18 at 11:15
  • I have 2 more concerns .a) in each read operation one value is being returned b) during my interview I was told that file read operation is costly could you please help me with above points. – Rama Sharma Oct 24 '18 at 12:32
  • https://stackoverflow.com/questions/44150483/java-copy-files-efficiently-with-channel?rq=1 we can have a look into this to understand FileChannel API , t,https://stackoverflow.com/questions/44150483/java-copy-files-efficiently-with-channel?rq=1 – Rama Sharma Oct 24 '18 at 12:33