Questions tagged [input-split]

35 questions
5
votes
2 answers

How to read a record that is split into multiple lines and also how to handle broken records during input split

I have a log file as below Begin ... 12-07-2008 02:00:05 ----> record1 incidentID: inc001 description: blah blah blah owner: abc status: resolved end .... 13-07-2008 02:00:05 Begin ... 12-07-2008 03:00:05 ----> record2…
ghosts
  • 177
  • 2
  • 15
4
votes
3 answers

Hadoop input split for a compressed block

If i have a compressed file of 1GB which is splittable and by default the block size and input split size is 128MB then there are 8 blocks created and 8 input split. When the compressed block is read by map reduce it is uncompressed and say after…
ZAHEER AHMED
  • 507
  • 1
  • 5
  • 9
4
votes
2 answers

Hadoop MapReduce RecordReader Implementation Necessary?

From the Apache doc on the Hadoop MapReduce InputFormat Interface: "[L]ogical splits based on input-size is insufficient for many applications since record boundaries are to be respected. In such cases, the application has to also implement a…
AST
  • 211
  • 6
  • 18
3
votes
2 answers

How to handle multiline record for inputsplit?

I have a text file of 100 TB and it has multiline records. And we are not given that each records takes how many lines. One records can be of size 5 lines, other may be of 6 lines another may be 4 lines. Its not sure the line size may vary for each…
java_enthu
  • 2,279
  • 7
  • 44
  • 74
2
votes
2 answers

file storage, block size and input splits in Hadoop

Consider this scenario: I have 4 files each 6 MB each. HDFS block size is 64 MB. 1 block will hold all these files. It has some extra space. If new files are added, it will accommodate here Now when the input splits are calculated for Map-reduce job…
brain storm
  • 30,124
  • 69
  • 225
  • 393
2
votes
1 answer

MapReduce: How input splits are done when 2 blocks are spread across different nodes?

I read following wiki but still not able to clarify one thing. https://wiki.apache.org/hadoop/HadoopMapReduce Say, I have a large file that's broken into two HDFS blocks and the blocks are physically saved into 2 different machines. Consider there…
Arijit Banerjee
  • 164
  • 1
  • 7
2
votes
2 answers

Creating custom InputFormat and RecordReader for Binary Files in Hadoop MapReduce

I'm writing a M/R job that processes large time-series-data files written in binary format that looks something like this (new lines here for readability, actual data is continuous,…
sa125
  • 28,121
  • 38
  • 111
  • 153
1
vote
1 answer

Calculating input splits in MapReduce

A file is stored in HDFS of size 260 MB whereas the HDFS default block size is 64 MB. Upon performing a map-reduce job against this file, I found the number of input splits it creates is only 4. how did it calculated.? where is the rest 4 MB.? Any…
TheCodeCache
  • 820
  • 1
  • 7
  • 27
1
vote
0 answers

In Python 3.X, how do you program a print to only occur if the input.split() contains none of the items being checked in a for loop?

I am working on a text-based adventure game in python. Right now I have it so that the main game loop always prints "What do you want to do?" and splits the input into individual words. I have an action called check (also examine, observe, look,…
1
vote
1 answer

Does the splits like FileSplit in Hadoop change the blocks?

First Question: I want to know if the Splits change the blocks in any means (i.e. change size, shift the block to another location, create new blocks, ...). Second Question: I think the splits doesn't change the blocks but it specifies where each…
Mosab Shaheen
  • 1,114
  • 10
  • 25
1
vote
1 answer

Python Input Split with a limit range

var1,var2 = input("Enter two digits a and b (0-9):").split(' ') while True: if (0 <= var1 <= 9) and (0 <= var2 <= 9): result = var1+var2 print("The result is: %r." %result) I use Spyder Python 3.5 to write this code and try to run…
user3700852
1
vote
2 answers

Input split and block in hadoop

I have a file size of 100 MB and say default block size is 64 MB. If I do not set the input split size, the default split size will be block size. Now the split size is also 64 MB. When I load this 100 MB file into HDFS, the 100 MB file will split…
Wanderer
  • 447
  • 3
  • 11
  • 20
1
vote
2 answers

Number of input splits is equals to number of mappers?

I am processing the the one file with the map reduce that file size is 1Gb and my default block size in HDFS is 64 MB so for this example how many input splits is there and how many mappers is there ?
1
vote
0 answers

How to select top rows in hadoop?

I am reading a 138MB file from Hadoop and trying to assign sequence numbers to each record. Below is the approach I followed. I read the entire file using cascading, assigned current slice number and current record counter to each record. This was…
1
vote
0 answers

How can I explain Hadoop not to split my file in some special MapReduce task?

Given I have a file to process with Hadoop and I know that size of file is smaller than block size of HDFS. Does this guarantees that the file will not be splitted and I dont need to write an InputSplit for it because the default one will not split…
MiamiBeach
  • 3,261
  • 6
  • 28
  • 54
1
2 3