2

I have HDP 1.1 on windows Server 2008 R2.
I loaded a web log in to hive table. Create Table statement:

create table logtable (datenonQuery string , hours string, minutes string, seconds string, TimeTaken string, Method string, UriQuery string, ProtocolStatus string) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties( "input.regex" = "(\\S+)\\t(\\d+):(\\d+):(\\d+)\\t(\\S+)\\t(\\S+)\\t(\\S+)\\t(\\S+)", "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s") stored as textfile; 

Load Statement:

load data local inpath 'D:\Logfiles\' into table logtable;

Select Statement:

Select * from logtable;

Everything so far works fine.

The following statement fails:

Select count(*) from logtable;

with exception:

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Edit1:

The diagnostic info in the Failed Job Table shows following info:

'# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201306251711_0010_m_000000'

Srinivas
  • 2,479
  • 8
  • 47
  • 69
  • Look here: http://stackoverflow.com/q/11185528/891391 – yatul Jul 10 '13 at 07:31
  • It's not your actual error. It only says that there is too many mappers which failed. Look your concrete error in Mapper logs. There is a lot of possible cause for it. E.g. some of them here http://stackoverflow.com/a/15725084/891391 – yatul Jul 10 '13 at 08:17

3 Answers3

0

This is rather a hadoop related thing than hive. Th reason why SELECT * works and SELECT COUNT(*) doesn't is that latter involves a MR job. What is your datasize?

Try increasing the mapper heapsize by setting the property mapred.job.map.memory.mb to some higher value. Also try increasing the number of mappers by lowering the split size through mapred.min.split.size and see if it makes any difference.

Tariq
  • 34,076
  • 8
  • 57
  • 79
0

If the output resultset has 2 columns with same name (possible in hive/impala) then count(*) won't work.

e.g. Query #1 will give results whereas Query #2 will give an error.

Resolution - Aliasing product_code column will resolve the error from Query #2

1) Select a.product_code, b.product_code, b.product_name, a.purchase_date, a.purchase_qty from product_fact a inner join product_dim b on (a.product_code = b.product_code)

2) Select * from ( Select a.product_code, b.product_code, b.product_name, a.purchase_date, a.purchase_qty from product_fact a inner join product_dim b on (a.product_code = b.product_code) ) as C

0

For me this particular error was an access issue. It was resolved when I connected to the DB using a username and password

Alex Joseph
  • 4,719
  • 2
  • 21
  • 25