4

Sales Driver class

package mr.map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.FloatWritable;
//import org.apache.hadoop.mapreduce.Mapper;
//import org.apache.hadoop.mapreduce.Reducer;

public class SalesDriver 
{
    public static void main(String args[]) throws Exception
    {
        Configuration c=new Configuration();
        Job j=new Job(c,"Sales");

        j.setJarByClass(SalesDriver.class);
        j.setMapperClass(SalesMapper.class);
        j.setReducerClass(SalesReducer.class);

        //j.setNumReduceTasks(0);
        j.setOutputKeyClass(Text.class);
        j.setOutputValueClass(FloatWritable.class);

        Path in=new Path(args[0]);
        Path out=new Path(args[1]);

        FileInputFormat.addInputPath(j, in);
        FileOutputFormat.setOutputPath(j, out);

        System.exit(j.waitForCompletion(true)?0:1);
    }
}

Sales Mapper Class

package mr.map;

import java.io.IOException;

//import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;
//import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class SalesMapper extends Mapper<LongWritable, Text, Text, FloatWritable>
{
    public void map(LongWritable k, Text v, Context con) throws IOException, InterruptedException
    {
        String w[]=v.toString().split(" ");
        String product=w[3];
        //String store=w[2];
        //float cost=Integer.parseInt(w[4]);
        float costx = Float.parseFloat(w[4]);

        //String newline= product+","+store; //","+costx;
        //String newline = product;
        con.write(new Text(product), new FloatWritable(costx));
    }
}

Sales Reducer Class

package mr.map;

import java.io.IOException;

import org.apache.hadoop.io.FloatWritable;
//import org.apache.hadoop.io.IntWritable;
//import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class SalesReducer extends Reducer<Text, FloatWritable, Text, FloatWritable>
{
    public void reduce(Text k, Iterable<FloatWritable>vlist, Context con) throws IOException, InterruptedException
    {
        int tot=0;
        for (FloatWritable v:vlist)
        {
            tot += v.get();
        }
        //int total= (int)tot;
        con.write(new Text(k), new FloatWritable(tot));
    }
}

Result of the MapReduce

Result of the MapReduce

I am not able to understand why all the result is coming out in a large floating point number and all around the number 5.7480884E7.

Below is the example of input to the mapreduce program:

  2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex              
  2012-01-01 09:00 Fort Worth Women's Clothing 153.57 Visa          
  2012-01-01 09:00 San Diego Music 66.08 Cash                       
  2012-01-01 09:00 Pittsburgh Pet Supplies 493.51 Discover          
  2012-01-01 09:00 Omaha Children's Clothing 235.63 MasterCard      
  2012-01-01 09:00 Stockton Men's Clothing 247.18 MasterCard        
  2012-01-01 09:00 Austin Cameras 379.6 Visa                        
  2012-01-01 09:00 New York Consumer Electronics 296.8 Cash         
  2012-01-01 09:00 Corpus Christi Toys 25.38 Discover               
  2012-01-01 09:00 Fort Worth Toys 213.88 Visa                      
  2012-01-01 09:00 Las Vegas Video Games 53.26 Visa                 
  2012-01-01 09:00 Newark Video Games 39.75 Cash                    
  2012-01-01 09:00 Austin Cameras 469.63 MasterCard                 
  2012-01-01 09:00 Greensboro DVDs 290.82 MasterCard                
  2012-01-01 09:00 San Francisco Music 260.65 Discover              
  2012-01-01 09:00 Lincoln Garden 136.9 Visa                        
  2012-01-01 09:00 Buffalo Women's Clothing 483.82 Visa             
  2012-01-01 09:00 San Jose Women's Clothing 215.82 Cash            
  2012-01-01 09:00 Boston Cameras 418.94 Amex                       
  2012-01-01 09:00 Houston Baby 309.16 Visa                         
  2012-01-01 09:00 Las Vegas Books 93.39 Visa                       
  2012-01-01 09:00 Virginia Beach Children's Clothing 376.11 Amex   
  2012-01-01 09:01 Riverside Consumer Electronics 252.88 Cash       
  2012-01-01 09:01 Tulsa Baby 205.06 Visa                           
  2012-01-01 09:01 Reno Crafts 88.25 Visa                           
  2012-01-01 09:01 Chicago Books 31.08 Cash                         
  2012-01-01 09:01 Fort Wayne Men's Clothing 370.55 Amex            
  2012-01-01 09:01 San Bernardino Consumer Electronics 170.2 Cash   
  2012-01-01 09:01 Madison Men's Clothing 16.78 Visa                
  2012-01-01 09:01 Austin Sporting Goods 327.75 Discover            
  2012-01-01 09:01 Portland CDs 108.69 Amex                         
  2012-01-01 09:01 Riverside Sporting Goods 15.41 Discover          
  2012-01-01 09:01 Reno Toys 80.46 Visa                             
  2012-01-01 09:01 Anchorage Music 298.86 MasterCard    
Binary Nerd
  • 13,872
  • 4
  • 42
  • 44
  • This this output of the mapreduce programBaby 5.7480884E7 Books 5.743978E7 CDs 5.7400252E7 Cameras 5.728862E7 Children's Clothing 5.7612936E7 Computers 5.7303832E7 Consumer Electronics 5.744192E7 Crafts 5.7407532E7 DVDs 5.763812E7 Garden 5.7528848E7 Health and Beauty 5.7469112E7 Men's Clothing 5.7609916E7 Music 5.7484752E7 Pet Supplies 5.7186328E7 Sporting Goods 5.7587608E7 Toys 5.7452464E7 Video Games 5.750184E7 Women's Clothing 5.7423576E7 – habeebsiddique Aug 16 '16 at 22:23

2 Answers2

0

Change reducer's output value type to Text and convert Float to string in expected format.

String.format("%f",tot)

follow below posts for more details formatting numbers:
with scientific
without scientific notation

Reducer:

public class SalesReducer extends Reducer<Text, FloatWritable, Text, Text>
{
    public void reduce(Text k, Iterable<FloatWritable>vlist, Context con) throws IOException, InterruptedException
    {
        float tot=0;
        for (FloatWritable v:vlist)
        {
            tot += v.get();
        }
        //int total= (int)tot;
        con.write(new Text(k), new Text(String.format("%f",tot)));
    }
}
Community
  • 1
  • 1
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
  • I followed the conversion to string as given above. but i got the error as a "java.util.IllegalFormatConversionException: f != java.lang.Integer". I replaced the "%f" by "%s". the program ran without errors, but the output is 57400252 instead of 5.7400252E7 i.e. without the decimal places – habeebsiddique Aug 17 '16 at 16:22
  • i had posted it to give you clue about the issue. what you could do is change the data type of tot to float. use DecimalFormat api to achieve the required formatting. – Rahul Sharma Aug 17 '16 at 16:33
  • i also changed tot type to float but it gave the same result. – habeebsiddique Aug 17 '16 at 22:17
  • I think it could be a floating point over flow or some internal typecasting issue. I think i need to typecast the iterable variable but i am not sure, what do you advise, how can i typecast the iterable. – habeebsiddique Aug 17 '16 at 22:18
  • follow scientific notation link in answer. – Rahul Sharma Aug 17 '16 at 22:21
0

You are storing the value of sum of floats in an int variable.
Now first thing is int would not able to handle the float values with accuracy after the decimal point.
Second, if the number of rows is very high, the sum value may exceed well beyond the acceptable range of int.

Please try changing the tot variable from int to float or double.

double tot=0;
Sumeet Gupta
  • 198
  • 1
  • 13
  • i changed the type to float. it gave the same results as before. – habeebsiddique Aug 17 '16 at 15:28
  • your input data set contains variable length records which you are trying to separate on spaces. **2012-01-01 09:00 San Jose Men's Clothing 214.05 Amex** this has numeric value at w[6] position, whereas **2012-01-01 09:01 Reno Toys 80.46 Visa** this has at w[4].I am not confident if mappers are counting the right position of string. – Sumeet Gupta Aug 17 '16 at 17:55
  • the fields are separate by tab spacing, I am using a tab spacer in my mapper. the mapper is giving the right output. – habeebsiddique Aug 17 '16 at 22:11
  • I am seeing a floating point over flow or some internal typecasting issue. I think i need to typecast the iterable variable but i am not sure, what do you advise – habeebsiddique Aug 17 '16 at 22:12
  • Iterable variable FloatWritable seems fine to me. But you can always try if doubtful. Can you tell me how large is your data set? how many total rows approx.? I recently faced a similar issue where the values were coming absurd because the total sum was going beyond 9 digits. I changed the total variable to double and that did it for me. you can try if it helps. – Sumeet Gupta Aug 18 '16 at 16:09