Questions tagged [input-split]
35 questions
5
votes
2 answers
How to read a record that is split into multiple lines and also how to handle broken records during input split
I have a log file as below
Begin ... 12-07-2008 02:00:05 ----> record1
incidentID: inc001
description: blah blah blah
owner: abc
status: resolved
end .... 13-07-2008 02:00:05
Begin ... 12-07-2008 03:00:05 ----> record2…

ghosts
- 177
- 2
- 15
4
votes
3 answers
Hadoop input split for a compressed block
If i have a compressed file of 1GB which is splittable and by default the block size and input split size is 128MB then there are 8 blocks created and 8 input split. When the compressed block is read by map reduce it is uncompressed and say after…

ZAHEER AHMED
- 507
- 1
- 5
- 9
4
votes
2 answers
Hadoop MapReduce RecordReader Implementation Necessary?
From the Apache doc on the Hadoop MapReduce InputFormat Interface:
"[L]ogical splits based on input-size is insufficient for many
applications since record boundaries are to be respected. In such
cases, the application has to also implement a…

AST
- 211
- 6
- 18
3
votes
2 answers
How to handle multiline record for inputsplit?
I have a text file of 100 TB and it has multiline records. And we are not given that each records takes how many lines. One records can be of size 5 lines, other may be of 6 lines another may be 4 lines. Its not sure the line size may vary for each…

java_enthu
- 2,279
- 7
- 44
- 74
2
votes
2 answers
file storage, block size and input splits in Hadoop
Consider this scenario:
I have 4 files each 6 MB each. HDFS block size is 64 MB.
1 block will hold all these files. It has some extra space. If new files are added, it will accommodate here
Now when the input splits are calculated for Map-reduce job…

brain storm
- 30,124
- 69
- 225
- 393
2
votes
1 answer
MapReduce: How input splits are done when 2 blocks are spread across different nodes?
I read following wiki but still not able to clarify one thing.
https://wiki.apache.org/hadoop/HadoopMapReduce
Say, I have a large file that's broken into two HDFS blocks and the blocks are physically saved into 2 different machines. Consider there…

Arijit Banerjee
- 164
- 1
- 7
2
votes
2 answers
Creating custom InputFormat and RecordReader for Binary Files in Hadoop MapReduce
I'm writing a M/R job that processes large time-series-data files written in binary format that looks something like this (new lines here for readability, actual data is continuous,…

sa125
- 28,121
- 38
- 111
- 153
1
vote
1 answer
Calculating input splits in MapReduce
A file is stored in HDFS of size 260 MB whereas the HDFS default block size is 64 MB. Upon performing a map-reduce job against this file, I found the number of input splits it creates is only 4. how did it calculated.? where is the rest 4 MB.? Any…

TheCodeCache
- 820
- 1
- 7
- 27
1
vote
0 answers
In Python 3.X, how do you program a print to only occur if the input.split() contains none of the items being checked in a for loop?
I am working on a text-based adventure game in python. Right now I have it so that the main game loop always prints "What do you want to do?" and splits the input into individual words.
I have an action called check (also examine, observe, look,…

John Hedlund-Fay
- 31
- 1
1
vote
1 answer
Does the splits like FileSplit in Hadoop change the blocks?
First Question: I want to know if the Splits change the blocks in any means (i.e. change size, shift the block to another location, create new blocks, ...).
Second Question: I think the splits doesn't change the blocks but it specifies where each…

Mosab Shaheen
- 1,114
- 10
- 25
1
vote
1 answer
Python Input Split with a limit range
var1,var2 = input("Enter two digits a and b (0-9):").split(' ')
while True:
if (0 <= var1 <= 9) and (0 <= var2 <= 9):
result = var1+var2
print("The result is: %r." %result)
I use Spyder Python 3.5 to write this code and try to run…
user3700852
1
vote
2 answers
Input split and block in hadoop
I have a file size of 100 MB and say default block size is 64 MB. If I do not set the input split size, the default split size will be block size. Now the split size is also 64 MB.
When I load this 100 MB file into HDFS, the 100 MB file will split…

Wanderer
- 447
- 3
- 11
- 20
1
vote
2 answers
Number of input splits is equals to number of mappers?
I am processing the the one file with the map reduce that file size is 1Gb and my default block size in HDFS is 64 MB so for this example how many input splits is there and how many mappers is there ?

koti developer
- 41
- 10
1
vote
0 answers
How to select top rows in hadoop?
I am reading a 138MB file from Hadoop and trying to assign sequence numbers to each record. Below is the approach I followed.
I read the entire file using cascading, assigned current slice number and current record counter to each record. This was…

Abhishek Korpe
- 11
- 1
1
vote
0 answers
How can I explain Hadoop not to split my file in some special MapReduce task?
Given I have a file to process with Hadoop and I know that size of file is smaller than block size of HDFS. Does this guarantees that the file will not be splitted and I dont need to write an InputSplit for it because the default one will not split…

MiamiBeach
- 3,261
- 6
- 28
- 54