0

I'm writing a java program for a school project that reads in data about covid-19 from a csv text file which has data in the following format:(date, state, fips code, # of cases, # of deaths). The problem is that the text file my professor gave us has over 60,000 lines of redundant data and the data I really need is on the final 55 lines of the text file. The file contains data about every single day from January 1st, 2021 to March 23rd 2023 and I only need the total amount of cases and total amount of deaths per U.S territory which essentially is located on the last 55 lines of the file. I already have most of the program working, I just need to make it so the data from those last 55 lines is read in so I can use it in some calculations. Is there any way for me to have my Buffered Reader only read the last 55 lines of the file?

5/1/2023: Thanks for all the answers, but I worked out a solution on my own yesterday. I just wrapped the while loop inside an if statement that would ensure that only lines beginning with the string "2023-03-23" would be read into the program.

Here's my code if anyone wants to look at it

import java.io.*;
import java.util.*;
public class StateCovidStats extends State 
{
    private int cases, deaths;
    
    //default
    public StateCovidStats() 
    {
        
    }
    
    //constructor
    public StateCovidStats(String theName, int caseCount, int deathCount) 
    {
        super(theName);
        cases=caseCount;
        deaths=deathCount;
    }
    
    public void displayStats() 
    {
        System.out.println("----------------------------------------");
        System.out.println(this.getName()+" Stats:");
        System.out.println("Cases: "+this.getCases());
        System.out.println("Deaths: "+this.getDeaths());
        System.out.println("----------------------------------------");
    }
    
    //getters
    public String getStateName() 
    {
        return this.getName();
    }
    
    public int getCases() 
    {
        return cases;
    }
    
    public int getDeaths() 
    {
        return deaths;
    }
    
    //setters
    public void setStateName(String newName) 
    {
        this.setName(newName);
    }
    
    public void setCases(int caseCount)
    {
        cases=caseCount;
    } 
    
    public void setDeaths(int deathCount) 
    {
        deaths=deathCount;
    }
    
    public String toString() 
    {
        String s="";
        s+=this.getStateName()+" Cases: "+this.getCases()+" Deaths: "+this.getDeaths();
        return s;
        
    }
    
    public static String findMaxCases(ArrayList<StateCovidStats> list) 
    {
        
        int greatest=Integer.MIN_VALUE;
        String nameOfMax="";
        for(int i=0;i<list.size();i++) 
        {
            if(list.get(i).getCases()>greatest) {
                greatest=list.get(i).getCases();
                nameOfMax=list.get(i).getName();
            }
                
        }
        
        return nameOfMax;
    }
    
    public static String findMaxDeaths(ArrayList<StateCovidStats> list) 
    {
        
        int greatest=Integer.MIN_VALUE;
        String nameOfMax="";
        for(int i=0;i<list.size();i++) 
        {
            if(list.get(i).getDeaths()>greatest) {
                greatest=list.get(i).getDeaths();
                nameOfMax=list.get(i).getName();
            }
                
        }
        
        return nameOfMax;
    }
    
    public static String findMinCases(ArrayList<StateCovidStats> list) 
    {
        
        int least=Integer.MAX_VALUE;
        String nameOfMin="";
        for(int i=0;i<list.size();i++) 
        {
            if(list.get(i).getCases()<least) {
                least=list.get(i).getCases();
                nameOfMin=list.get(i).getName();
            }
                
        }
        
        return nameOfMin;
    }
    
    public static String findMinDeaths(ArrayList<StateCovidStats> list) 
    {
        
        int least=Integer.MAX_VALUE;
        String nameOfMin="";
        for(int i=0;i<list.size();i++) 
        {
            if(list.get(i).getDeaths()<least) {
                least=list.get(i).getDeaths();
                nameOfMin=list.get(i).getName();
            }
                
        }
        
        return nameOfMin;
    }
    
    
    
    public static void main(String[] args) 
    {
        
            //us-states-Jan-2020-through-March-2023
        //sampleDataOld2021
            
    
        
        String line="";
        String start="";
        
        long totDeath=0;
        long totCase=0;
        
        
        
        ArrayList<StateCovidStats> states=new ArrayList<StateCovidStats>();
        
        try   
        {  
         
        BufferedReader br = new BufferedReader(new FileReader(".\\src\\/us-states-Jan-2020-through-March-2023.txt"));  

        
        while ((line = br.readLine()) != null )  
            {
            
        String[] covid = line.split(",");    // use comma as separator  
          
        
        
        totDeath+=Integer.parseInt(covid[4]);
        totCase+=Integer.parseInt(covid[3]);
        
        StateCovidStats temp=new StateCovidStats(covid[1],Integer.parseInt(covid[3]),Integer.parseInt(covid[4]));
        
    
        for(int i=0; i<=states.size();i++) 
            {
                try
                {
                if((states.get(i).getName().equalsIgnoreCase(temp.getName()))) 
                {
                    
                    break;
                }
                
                } //end of  2nd try
                catch(IndexOutOfBoundsException e) 
                {
                    states.add(temp);
                    
                    break;
                }
            }//end of for
        
                
            } //end of while
        
        
        br.close();
        }   //end of try
        catch (IOException e)   
        {  
        e.printStackTrace();  
        }  //end of catch
        
    
            
    /*      
        for(int i=0; i<states.size();i++)
    {
            System.out.println(states.get(i).getName());
            System.out.println("Cases: "+states.get(i).getCases());
            System.out.println("Deaths: "+states.get(i).getDeaths());
            System.out.println("------------------------------------");
                    
    }
    */
        //output
        String maxCase=findMaxCases(states);
        String minCase=findMinCases(states);
        String maxDeath=findMaxDeaths(states);
        String minDeath=findMinDeaths(states);
        int maxCaseNum=0;
        int minCaseNum=0;
        int maxDeathNum=0;
        int minDeathNum=0;
        
        for(int i=0; i<states.size();i++) 
        {

            if(states.get(i).getName().equals("New Jersey")) 
            {
                states.get(i).displayStats();
            }
            if(states.get(i).getName().equals(maxCase)) 
            {
                maxCaseNum=states.get(i).getCases();
            }
            
            if(states.get(i).getName().equals(minCase)) 
            {
                 minCaseNum=states.get(i).getCases();
            }
            
            if(states.get(i).getName().equals(maxDeath)) 
            {
                 maxDeathNum=states.get(i).getDeaths();
            }
            
            if(states.get(i).getName().equals(minDeath)) 
            {
                 minDeathNum=states.get(i).getDeaths();
            }
            
            
            
        }
        System.out.println("State With Most Cases: "+maxCase+", "+maxCaseNum+" Cases");
        System.out.println("State With Least Cases: "+minCase+", "+minCaseNum+" Cases");
        System.out.println("State With Most Deaths: "+maxDeath+", "+maxDeathNum+" Deaths");
        System.out.println("State With Least Deaths: "+minDeath+", "+minDeathNum+" Deaths");            
        System.out.println("Total US Cases: "+totCase);
        System.out.println("Total US Deaths: "+totDeath);
        System.out.println("------------------------------------");
        System.out.println("AVG State Cases: "+totCase/55);
        System.out.println("AVG State Deaths: "+totDeath/55);
        
        for(int i=0; i<states.size();i++) 
        {

            if(states.get(i).getName().equals("New Jersey")) 
            {
                states.get(i).displayStats();
            }
            
            
        }
        
        
    }   //end of main
} //end of class


public class State 
{
    
        private String stateName,timeZone;
        private int population;
        private double density;
        
        
        //default constructor
        public State()
        {
            
        }
        
        //constructor
        public State(String theName) 
        {
            stateName=theName;
        }
        
        //name getter+setter
        public String getName() 
        {
            return stateName;
        }
        
        public void setName(String theName) 
        {
            stateName=theName;
        }
    
}

Dzubek
  • 1
  • 1
  • 2
    You can't. It is up to you to ignore lines you don't want to process. – user207421 Apr 30 '23 at 08:07
  • You could read the end of the file using a random access file and reading approximately the last 55 lines. Then you split it by carriage return and work backwards line by line reading the last 55 lines. No need for BufferedFileReader – dave110022 Apr 30 '23 at 09:21
  • 3
    Does this help you [How to read last 5 lines of a .txt file into java](https://stackoverflow.com/questions/9465269/how-to-read-last-5-lines-of-a-txt-file-into-java) – alea Apr 30 '23 at 12:17
  • Use a text editor to save the last 25 lines into a separate file for your Java program to read. Or following the suggestion by @user207421 read every line, parse the date into a `LocalDate`, and if it is not in the relevant range, ignore the line. – Ole V.V. Apr 30 '23 at 14:53
  • I don’t know what you may need time zone for. It’s more correct to use `ZoneId` to represent it (not `String`). Also be aware that some US states span more than one time zone. – Ole V.V. Apr 30 '23 at 17:01

4 Answers4

2

Is there any way for me to have my Buffered Reader only read the last 55 lines of the file?

Yes, there are ways; see the other answers.

But I think this is premature optimization. Yes, reading and skipping 60,000 lines is going to be inefficient1. But here's the thing. Unless the project requirements include something that says that the code must run in under N seconds or must be "as fast as possible", you can make the argument the code only needs to be reasonably fast. It should only take a couple of seconds2 to read and skip 60,000 lines. That is reasonable (IMO).

So ...

Assuming that there are no stated performance goals / requirements, my advice would be to write the code the simplest way to start with. Then measure how long it takes to run the code ... and decide whether it is worth the effort to optimize the code by (say) reading the file in reverse.


1 - Inefficient in terms of execution time. But probably more efficient in terms of programmer time. And even for student programmers, time spent optimizing code unnecessarily is time that could be spent on something more productive.
2 - Probably a lot less than that, though it will depend on a few factors that we don't need to go into here.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
0

Method 1: Read all lines into an ArrayList and extract the last certain lines.

public class ReadCertainLastLines {
    ReadCertainLastLines() {
        ArrayList<String> readLines = new ArrayList<>();

        String path = "path to csv file";
        int noOfLinesToReadFromEnd = 10;

        readAll(path, readLines);

        String[] last_10 = extractLines(readLines, noOfLinesToReadFromEnd);
        System.out.println(Arrays.toString(last_10));
    }

    public static void main(String[] args) {
        new ReadCertainLastLines();
    }

    public void readAll(String path, ArrayList<String> toWhichArray) {
        try {
            String line;
            BufferedReader br = new BufferedReader(new FileReader(path));

            while ((line = br.readLine()) != null) {
                String[] values = line.split(",");
                toWhichArray.add(Arrays.toString(values));
            }

            br.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public String[] extractLines(ArrayList<String> originalArray, int noOfLines) {

        String[] cutOffList = new String[noOfLines];

        for (int loop = 0; loop <= noOfLines - 1; loop++) {
            cutOffList[loop] = originalArray.get(originalArray.size() - noOfLines + loop);
        }

        return cutOffList;
    }
}

Note that the above code will read all lines with splitting.

Another example of using this code without splitting every line is:

public class ReadCertainLines2 {
    ReadCertainLines2() {

        String csvFile = "your_file.csv";
        int linesToKeep = 10;

        String[] last_10 = ReadCertainLines(csvFile, linesToKeep);
        System.out.println(Arrays.toString(last_10));
    }

    public static void main(String[] args) {
        new ReadCertainLines2();
    }


    public String[] ReadCertainLines(String csvFile, int linesToKeep) {
        ArrayList<String> lines = new ArrayList<>();
        ArrayList<String> extractedLines = new ArrayList<>();

        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {

            String line;

            // Read all lines from the CSV file
            while ((line = br.readLine()) != null) {
                lines.add(line);
            }

            // Keep only the last 10 lines
            int startIndex = lines.size() > linesToKeep ? lines.size() - linesToKeep : 0;

            for (int i = startIndex; i < lines.size(); i++) {
                extractedLines.add(lines.get(i));
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

        return extractedLines.toArray(new String[0]);
    }
}

In this sample codes, I'm reading all lines into an pre-initialized ArrayList of string and extracting the last certain lines to a String list (String[]). You can use this as a tool class of your program. In this case this reads all lines and stores. So it will use a lot of computer memory as your csv has 60 000 lines.

noOfLinesToReadFromEnd is the number of lines you need to read from the end of the csv. In your case it's 55.

Method 2: Use Apache Commons IO ReversedLinesFileReader and read first 55 lines.

You can use Apache Commons IO ReversedLinesFileReader to this. Then read it like you're reading first 55 lines of the CSV. But in this case you'll get last line first and ... Of course you can use Reserve the lines again and read it.

public class ReadCertainLinesCommons {

    ReadCertainLinesCommons() {

        String csvFile = "path to csv file";
        int noOfLinesToReadFromEnd = 10;

                String[] last_10 = extractLines(readLines, noOfLinesToReadFromEnd);
    System.out.println(Arrays.toString(last_10));

    }

    public static void main(String[] args) {
        new ReadCertainLinesCommons();
    }

    public String[] ReadLastCertainLinesWithCommons(String csvFile, int lastLinesToKeep) {

        ArrayList<String> toWhichArray = new ArrayList<>(lastLinesToKeep);

        try (ReversedLinesFileReader reader = new ReversedLinesFileReader(new File(csvFile), Charset.defaultCharset())) {

            // Read the last 10 lines
            String line;
            int lineCount = 0;

            while ((line = reader.readLine()) != null && lineCount <= lastLinesToKeep - 1) {

                String[] values = line.split(",");
                toWhichArray.add(Arrays.toString(values));

                lineCount++;
            }

        } catch (IOException e) {
            e.printStackTrace();
        }

        return toWhichArray.toArray(new String[0]);
    }
}

This code uses the ReversedLinesFileReader class to read the CSV file in reverse order, starting from the end of the file. It then reads the last 10 lines of the file and adds them to a ArrayList object. It converts it to a String List. Finally, it prints out the last 10 lines. Note that the lines are added to the String List in reverse order, so they need to be reversed again before printing them out. I did it too. (Look at the code)

I've another example of using ReversedLinesFileReader. It doesn't process a split at all. So it's more quicker.

public class ReadLast10LinesCSV {
 
    public static void main(String[] args) {
 
        String csvFile = "your_file.csv";
        List<String> lines = new ArrayList<>();
        int linesToKeep = 10;
 
        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
 
            String line;
 
            // Read all lines from the CSV file
            while ((line = br.readLine()) != null) {
                lines.add(line);
            }
 
            // Keep only the last 10 lines
            int startIndex = lines.size() > linesToKeep ? lines.size() - linesToKeep : 0;
            for (int i = startIndex; i < lines.size(); i++) {
                System.out.println(lines.get(i));
            }
 
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Note that this method will be much quicker as it doesn't read all the lines of csv. (Thanks to @StephenC)

Method 3: Using java.io.RandomAccessFile

This is the most appropriate and most quickest way. It is the most powerful way when comparing the three methods.

Why: It doesn't read all the lines as first method. It doesn't need to reverse order the lines twice as the second method. It doesn't need splitting lines as method 1 - example 1 and method 2 - example 1.

So thanks to @StephenC for giving me the idea.

Please note that this's isn't a great util class yet. I'll improve it.

public class ReadCertainLines_Best {
    ReadCertainLines_Best() {
        String csvFile = "your csv file here";
        int linesToKeep = 10;

        printThem(ReadCertainLines(csvFile, linesToKeep));
    }

    public static void main(String[] args) {
        new ReadCertainLines_Best();
    }

    public HashMap<String, Object> ReadCertainLines(String csvFile, int linesToKeep) {
        byte[] buffer = new byte[1024];
        StringBuilder sb = new StringBuilder();

        HashMap<String, Object> map = new HashMap<>();

        try (RandomAccessFile file = new RandomAccessFile(csvFile, "r")) {

            // Find the last line ending position
            long fileLength = file.length();
            long filePointer = fileLength - 1;
            int lineCount = 0;

            while (filePointer >= 0 && lineCount < linesToKeep) {
                file.seek(filePointer);
                byte c = file.readByte();

                if (c == '\n' || c == '\r') {
                    lineCount++;
                }

                filePointer--;
            }

            // Read the last 10 lines
            while (true) {
                int bytesRead = file.read(buffer);

                if (bytesRead == -1) {
                    break;
                }

                for (int i = 0; i < bytesRead; i++) {
                    sb.append((char) buffer[i]);
                }
            }

            String[] lines = sb.toString().split("\n");

            // Print the last 10 lines
            int startIndex = lines.length > linesToKeep ? lines.length - linesToKeep : 0;

            map.put("startIndex", startIndex);
            map.put("length", lines.length);
            map.put("lines", lines);

//            for (int i = startIndex; i < lines.length; i++) {
//                System.out.println(lines[i]);
//            }

        } catch (IOException e) {
            e.printStackTrace();
        }

        return map;
    }

    public void printThem(HashMap<String, Object> map) {
        for (int i = (int) map.get("startIndex"); i < (int) map.get("length"); i++) {
            System.out.println(((String[]) map.get("lines"))[i]);
        }
    }
}

This code uses the RandomAccessFile class to seek to the end of the file, and then moves backwards through the file, counting the number of line breaks it encounters until it has found the last certain number of lines. It then reads those lines into a StringBuilder object and splits them into an array of individual lines. It creates a HashMap and passes lines, startIndex and the length of the String List. And print with the help of a for loop. This's neither yet great as a util class nor described well. But It works and it'll update the answer soon.

  • A question that this approach will naturally lead to is https://stackoverflow.com/q/39582014/217324 – Nathan Hughes May 01 '23 at 00:43
  • @NathanHughes - Most likely not. >That< problem only rears its head if you are reading a lot of data. 55 lines is not a lot. (And there's something suspicious about that question. It claims a ~15-fold slowdown but the accepted answer benchmarked it at only a 3-fold slowdown.) – Stephen C May 01 '23 at 01:16
  • @StephenC But it will slow. We are reading all of the lines and reversing order of them in `ReversedLinesFileReader`. So it will slow down for sure. The first method is with reading all of the lines but without reversing order. So it'll bit quicker than the second method. – Owner - DSF May 06 '23 at 00:13
  • 1
    @Owner-DSF - Well yes. But if you are obsessing about the time it takes to read the last 55 lines from a file **once** ... before you have even measured it ... then you are definitely doing (or thinking) "premature optimization". The point I was trying to make to Nathan was that it *should not be **inevitable** that this leads to that*. – Stephen C May 06 '23 at 03:27
  • And actually `ReversedLinesFileReader` doesn't do that. It doesn't read all of the lines and reverse them. What it actually does is to read the lines in (roughly) reverse order, by reading the file backwards using `RandomAccessFile`. (It reads blocks of the file in reverse order, and scans them in the reverse direction to find where the line breaks are and then assembles the lines ... as required. It is non-trivial ...) The second method will be faster if the file is large enough. – Stephen C May 06 '23 at 03:40
  • Here's the `ReversedLinesFileReader` source code: https://commons.apache.org/proper/commons-io/javadocs/api-2.5/src-html/org/apache/commons/io/input/ReversedLinesFileReader.html. Read it for yourself. (There could be a more recent version, but that doesn't matter for this purpose.) – Stephen C May 06 '23 at 03:47
  • @StephenC 'The second method will be faster if the file is large enough.' I got it. – Owner - DSF May 06 '23 at 07:20
0

If you are using terminal, you can get the number of lines in a file by using the command. You can also get this by manually looking at the file.

wc -l file_interested_in

In your java code where you are reading the file line by line, ignore all the lines except the last 55.

    String str = null;
    int currentLine = 0;
    while((str = br.readLine()) != null){
        currentLine++;
        if(totalLineCount - currentLine < 55){
            // Lines you are interested in
        }
    }

As an alternative extract the last 55 lines of your file in the terminal:

tail -55 file_intersted_in > relevant_lines

Now feed the relevant_lines files into your Java program and have it simply process all lines.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
justnisar
  • 528
  • 4
  • 10
0

If you want to use the date filtering method mentioned by @Ole V.V, you can do something like:

import java.time.LocalDate;
import java.time.Month;
import java.util.stream.Collectors;
import java.util.List;
import java.nio.file.Path;
import java.nio.file.Files;

public class CStats {
    public static void main(String[] args) {
        try {
            final LocalDate REF_DATE = LocalDate.parse(args[0]);
            List<StateCovidStats> statesStats = Files.lines(Path.of(args[1]))
                .skip(1)
                .map(line ->line.split(","))
                .filter(a ->LocalDate.parse(a[0]).isAfter(REF_DATE))
                .map(StateCovidStats::fromCsv)
                .collect(Collectors.toList());
            statesStats.stream()
                .forEach(System.out::println);
        }
        catch(Throwable t) {
            t.printStackTrace();
        }
    }
}

I did java CStats 2023-03-22 us-states.csv, with data I found with exactly the same headers, and it printed the last 56 lines.

g00se
  • 3,207
  • 2
  • 5
  • 9