I am reading input through stdin
(hadoop streaming in reducer).
I need to detect when last record comes in. I am running for loop on stdin
data.
I tried to read the stdin
first to calculate the total records and then again read to proceed with business processing, but as soon as I read a record from stdin
to calculate total_cnt
then the records goes out from the stream and later when I try to read stdin
for processing there is no record in stdin
.
total_cnt = 0
for line in stdin:
total cnt += 1
for line in stdin:
##Some Processing##
I don't want to store stdin
to somewhere and read data from that location twice (1. total record count and 2. data processing).
Is there any way I can detect when last record comes in from stdin
?
I am using python version 2.7.11 and need to implement this in approach in Hadoop reducer.