1

I am a updating a file which is on HDFS.

How could I ensure that the changes done by all the maps are there on file i.e if the write actions on the file were synchronized?

User97693321
  • 3,336
  • 7
  • 45
  • 69
dpsdce
  • 5,290
  • 9
  • 45
  • 58

2 Answers2

0

According to the Hadoop - The Definitive Guide

Multiple writers, arbitrary file modifications
Files in HDFS may be written to by a single writer. Writes are always made at the end of the file. There is no support for multiple writers or for modifications at arbitrary offsets in the file. (These might be supported in the future, but they are likely to be relatively inefficient.)

It's practically not possible for an application level synchronization because of the nature of Hadoop (multiple nodes/mappers/reducers etc).

The MapR distribution of Apache Hadoop supports random reads and writes with support for multiple readers and writers simultaneously.

Praveen Sripati
  • 32,799
  • 16
  • 80
  • 117
0

HDFS files are not mutable. So you can only append to them. The issue of concurrent append is covered here : Is it possible to append to HDFS file from multiple clients in parallel? In a nutshell - you should not.
I would also point out that it is not in "mr spirit". If you want to collect some data from a mappers and aggregate it together - it is exactly role of the reducer.

Community
  • 1
  • 1
David Gruzman
  • 7,900
  • 1
  • 28
  • 30