MapReduce skipping the first line of input file

Question

Anyone knows how to skip the first line of the input text file in MapReduce? For example, I have a following input file:

Student Score
00001   90
00002   95
00003   90
      .
      .
      .

Now, I would like to count the frequency of each scores. But I have to skip the first line, which is the title (Student, Score), right? How can I do this? In contratry, if I want to add a title row in the output file of the MapReduce (Score, Frequency), how can I do this? Thanks in advance!

possible duplicate of [Processing files with headers in Hadoop](http://stackoverflow.com/questions/1104336/processing-files-with-headers-in-hadoop) — nelsonda, Mar 24 '15 at 20:54
and possible duplicate of http://stackoverflow.com/questions/27854919/how-to-skip-header-from-csv-files-in-spark — jimijazz, Sep 18 '15 at 02:50

score -3 · Answer 1 · answered Dec 19 '14 at 23:09

-3

import java.util.Scanner;
import java.io.*; 
public class MyNameSpace{

public static void main(String[] args)
{
    try
    {
        Scanner c=new Scanner(new FileInputStream("filepath"));
        c.nextLine();//this gets the next line, since not assigning it to anything it just skips , if you want it, assign it to a string and use it
        //now read what you want to read
        c.close();
    }
    catch(FileNotFoundException e)
    {
        //process exception here
    }
}

}

Anyway, I expect many answers on google to be found to your question so do a lot of effort on searching before asking here.

answered Dec 19 '14 at 23:09

niceman

2,653
29
57

1

I am expecting to skip the first line in the MapReduce process, not to write a seperate program to do it. – AlwaysIng Dec 19 '14 at 23:31
so you want to process the big file with many programs, that's mapping, and then reduce it, how do you divide the file anyway? – niceman Dec 20 '14 at 08:24
1

Just for the record, this is not a good solution. It will remove the first line of for every mapper. So if your file is larger than a single HDFS block the code here will remove unexpected lines from the middle of the file. – nelsonda Mar 24 '15 at 20:57
This is not a solution for MapReduce Hadoop. – Matt Apr 03 '16 at 04:05

MapReduce skipping the first line of input file

1 Answers1