11

I am a bit confused with the term, a byte offset value, which is treated as map key in Hadoop Map reduce program.

First, what is the byte offset value?

Second, how is it generated, and how does one view this byte-offset value?

Udhav Sarvaiya
  • 9,380
  • 13
  • 53
  • 64
user3493414
  • 151
  • 1
  • 2
  • 5

3 Answers3

7

byte offset is the number of character that exists counting from the beginning of a line.

for example, this line

what is byte offset?

will have a byte offset of 19. This is used as key value in hadoop

m7913d
  • 10,244
  • 7
  • 28
  • 56
user2773013
  • 3,102
  • 8
  • 38
  • 58
4

Basically an offset is an integer which is used to find the distance ( absolute address) with respect to the base address.

Assume a Text file with the following data

Computer-science World
Quantum Computing

now the offset for the first line is 0 and the input to the hadoop job will be <0,Computer Science World> for the second line the offset will be <23,Quantum Computing>

whenever we pass the text file to hadoop job. It internally calculates the byte offset.

pradsav
  • 61
  • 4
1

The byte offset is the count of bytes starting at zero. One character or space is usually one byte when talking about Hadoop. But check out this question if you want to know more: How many bits in a character?

Community
  • 1
  • 1