Highest Voted 'input-split' Questions

5

votes

2 answers

How to read a record that is split into multiple lines and also how to handle broken records during input split

I have a log file as below Begin ... 12-07-2008 02:00:05 ----> record1 incidentID: inc001 description: blah blah blah owner: abc status: resolved end .... 13-07-2008 02:00:05 Begin ... 12-07-2008 03:00:05 ----> record2…

hadoop mapreduce input-split

asked Jul 18 '13 at 02:23

ghosts

177
2
15

4

votes

3 answers

Hadoop input split for a compressed block

If i have a compressed file of 1GB which is splittable and by default the block size and input split size is 128MB then there are 8 blocks created and 8 input split. When the compressed block is read by map reduce it is uncompressed and say after…

hadoop input-split

asked Oct 25 '15 at 15:22

ZAHEER AHMED

507
1
5
9

4

votes

2 answers

Hadoop MapReduce RecordReader Implementation Necessary?

From the Apache doc on the Hadoop MapReduce InputFormat Interface: "[L]ogical splits based on input-size is insufficient for many applications since record boundaries are to be respected. In such cases, the application has to also implement a…

java hadoop mapreduce input-split recordreader

asked Aug 06 '15 at 13:10

AST

211
6
18

3

votes

2 answers

How to handle multiline record for inputsplit?

I have a text file of 100 TB and it has multiline records. And we are not given that each records takes how many lines. One records can be of size 5 lines, other may be of 6 lines another may be 4 lines. Its not sure the line size may vary for each…

hadoop mapreduce hdfs input-split

asked May 21 '13 at 05:59

java_enthu

2,279
7
44
74

2

votes

2 answers

file storage, block size and input splits in Hadoop

Consider this scenario: I have 4 files each 6 MB each. HDFS block size is 64 MB. 1 block will hold all these files. It has some extra space. If new files are added, it will accommodate here Now when the input splits are calculated for Map-reduce job…

hadoop mapreduce hdfs input-split

asked Jul 28 '14 at 19:30

brain storm

30,124
69
225
393

2

votes

1 answer

MapReduce: How input splits are done when 2 blocks are spread across different nodes?

I read following wiki but still not able to clarify one thing. https://wiki.apache.org/hadoop/HadoopMapReduce Say, I have a large file that's broken into two HDFS blocks and the blocks are physically saved into 2 different machines. Consider there…

hadoop mapreduce hdfs input-split

asked Jun 27 '13 at 04:00

Arijit Banerjee

164
1
7

2

votes

2 answers

Creating custom InputFormat and RecordReader for Binary Files in Hadoop MapReduce

I'm writing a M/R job that processes large time-series-data files written in binary format that looks something like this (new lines here for readability, actual data is continuous,…

hadoop mapreduce binaryfiles input-split

asked May 10 '12 at 09:20

sa125

28,121
38
111
153

1

vote

1 answer

Calculating input splits in MapReduce

A file is stored in HDFS of size 260 MB whereas the HDFS default block size is 64 MB. Upon performing a map-reduce job against this file, I found the number of input splits it creates is only 4. how did it calculated.? where is the rest 4 MB.? Any…

hadoop mapreduce hadoop2 input-split

asked Feb 11 '18 at 18:33

TheCodeCache

820
1
7
27

1

vote

0 answers

In Python 3.X, how do you program a print to only occur if the input.split() contains none of the items being checked in a for loop?

I am working on a text-based adventure game in python. Right now I have it so that the main game loop always prints "What do you want to do?" and splits the input into individual words. I have an action called check (also examine, observe, look,…

python python-3.x dictionary for-loop input-split

asked Mar 10 '17 at 05:55

John Hedlund-Fay

31
1

1

vote

1 answer

Does the splits like FileSplit in Hadoop change the blocks?

First Question: I want to know if the Splits change the blocks in any means (i.e. change size, shift the block to another location, create new blocks, ...). Second Question: I think the splits doesn't change the blocks but it specifies where each…

hadoop input-split

asked Nov 27 '16 at 19:45

Mosab Shaheen

1,114
10
25

1

vote

1 answer

Python Input Split with a limit range

var1,var2 = input("Enter two digits a and b (0-9):").split(' ') while True: if (0 <= var1 <= 9) and (0 <= var2 <= 9): result = var1+var2 print("The result is: %r." %result) I use Spyder Python 3.5 to write this code and try to run…

addition python-3.5 input-split

asked Oct 09 '16 at 22:22

user3700852

1

vote

2 answers

Input split and block in hadoop

I have a file size of 100 MB and say default block size is 64 MB. If I do not set the input split size, the default split size will be block size. Now the split size is also 64 MB. When I load this 100 MB file into HDFS, the 100 MB file will split…

hadoop mapreduce hadoop2 input-split bigdata

asked May 06 '16 at 05:54

Wanderer

447
3
11
20

1

vote

2 answers

Number of input splits is equals to number of mappers?

I am processing the the one file with the map reduce that file size is 1Gb and my default block size in HDFS is 64 MB so for this example how many input splits is there and how many mappers is there ?

hadoop mapreduce hdfs mapper input-split

asked Oct 07 '15 at 12:03

koti developer

41
10

1

vote

0 answers

How to select top rows in hadoop?

I am reading a 138MB file from Hadoop and trying to assign sequence numbers to each record. Below is the approach I followed. I read the entire file using cascading, assigned current slice number and current record counter to each record. This was…

hadoop mapreduce cascading hadoop-partitioning input-split

asked Jul 20 '15 at 11:23

Abhishek Korpe

11
1

1

vote

0 answers

How can I explain Hadoop not to split my file in some special MapReduce task?

Given I have a file to process with Hadoop and I know that size of file is smaller than block size of HDFS. Does this guarantees that the file will not be splitted and I dont need to write an InputSplit for it because the default one will not split…

hadoop mapreduce input-split

asked Dec 01 '14 at 22:51

MiamiBeach

3,261
6
28
54

Questions tagged [input-split]