1

Anyone knows how to skip the first line of the input text file in MapReduce? For example, I have a following input file:

Student Score
00001   90
00002   95
00003   90
      .
      .
      .

Now, I would like to count the frequency of each scores. But I have to skip the first line, which is the title (Student, Score), right? How can I do this? In contratry, if I want to add a title row in the output file of the MapReduce (Score, Frequency), how can I do this? Thanks in advance!

niceman
  • 2,653
  • 29
  • 57
AlwaysIng
  • 11
  • 2
  • possible duplicate of [Processing files with headers in Hadoop](http://stackoverflow.com/questions/1104336/processing-files-with-headers-in-hadoop) – nelsonda Mar 24 '15 at 20:54
  • and possible duplicate of http://stackoverflow.com/questions/27854919/how-to-skip-header-from-csv-files-in-spark – jimijazz Sep 18 '15 at 02:50

1 Answers1

-3
import java.util.Scanner;
import java.io.*; 
public class MyNameSpace{

public static void main(String[] args)
{
    try
    {
        Scanner c=new Scanner(new FileInputStream("filepath"));
        c.nextLine();//this gets the next line, since not assigning it to anything it just skips , if you want it, assign it to a string and use it
        //now read what you want to read
        c.close();
    }
    catch(FileNotFoundException e)
    {
        //process exception here
    }
}

}

Anyway, I expect many answers on google to be found to your question so do a lot of effort on searching before asking here.

niceman
  • 2,653
  • 29
  • 57
  • 1
    I am expecting to skip the first line in the MapReduce process, not to write a seperate program to do it. – AlwaysIng Dec 19 '14 at 23:31
  • so you want to process the big file with many programs, that's mapping, and then reduce it, how do you divide the file anyway? – niceman Dec 20 '14 at 08:24
  • 1
    Just for the record, this is not a good solution. It will remove the first line of for every mapper. So if your file is larger than a single HDFS block the code here will remove unexpected lines from the middle of the file. – nelsonda Mar 24 '15 at 20:57
  • This is not a solution for MapReduce Hadoop. – Matt Apr 03 '16 at 04:05