6

I want to get output from two processes and merge them into one file, like:

proc1 >> output &
proc2 >> output &

The problem is that output may be mixed up in the final file. For example if first process writes:

hellow

and the second process writes:

bye

the result may be something like:

hebylloe

but I expect them to be in seperate lines like (order is not important):

bye

hello

So I used flock to synchronize writing to the file with the following script:

exec 200>>output
while read line;
  flock -w 2 200
  do echo $line>>output
  flock -u 200
done

And run the processes like:

proc1 | script &
proc2 | script &

Now the problem is that the performance is decreased significantly. without synchronization each process could write with the speed of 4MB/sec but using the synchronization script the write speed is 1MB/sec.

Can anyone help me how to merge the output from two processes and prevent mixing outputs up?

edit: I realized that there is a relation between line length and std buffer size, if size of each line is less than std buffer size, then every thing works well, nothing is mixed (at least in my tests). so I ran each script with bufsize command:

bufsize -o10KB proc1 | script &
bufsize -o10KB proc2 | script &

Now I want to make sure that this solution is bulletproof. I can not find any relation between buffer size and what happens now!!!

Community
  • 1
  • 1
ayyoob imani
  • 639
  • 7
  • 16
  • 2
    If you only have two processes, why not write two output files and then merge them afterwards? If you need to scale that up, look into using an appender like log4j. – xxfelixxx Aug 21 '16 at 08:39
  • It is better (not solving your problem) to use `echo "$line" >> output` (with quotes). – Walter A Aug 21 '16 at 08:46
  • What are you writing? For plain logfiles the hero who will read so much data will only get confused when 2 procs write in the same file. Or are you writing something that will go to a database some day? Start now. – Walter A Aug 21 '16 at 08:53
  • for some reason I have to write it with bash script. I know that I can handle the situation in C++ easily but I can not use anything but bash script... – ayyoob imani Aug 21 '16 at 09:05
  • Whar are you gioing to do with the `output` of 4 Mb/sec ? – Walter A Aug 21 '16 at 09:31
  • The process is creating output with a very high rate, and I dont want to loose it!!! – ayyoob imani Aug 21 '16 at 09:35

1 Answers1

2

Now I want to make sure that this solution is bulletproof. I can not find any relation between buffer size and what happens now!!!

For a fully buffered output stream, the buffer size determines the amount of data written with a single write(2) call. For a line buffered output stream, a line is written with a single write(2) call as long as it doesn't exceed the buffer size.

If the file was open(2)ed with O_APPEND, the file offset is first set to the end of the file before writing. The adjustment of the file offset and the write operation are performed as an atomic step.

See also these answers:

Community
  • 1
  • 1
Armali
  • 18,255
  • 14
  • 57
  • 171
  • Thank you Armali but how can I make sure if my Linux shell redirect (>>) implementation is based on write(2) not write(3), cause write(3) does not guarantee such thing.!! – ayyoob imani Aug 24 '16 at 04:08
  • @ayyoob imani: Do you mean [`write(3)`](http://linux.die.net/man/3/write) from the POSIX Manual? On Linux of course the Linux implementation [`write(2)`](http://linux.die.net/man/2/write) is in effect. – Armali Aug 24 '16 at 13:22
  • Besides that, also POSIX says _If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation_ and _Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified._ These _shall_ requirements are to be met by the operating system kernel. – Armali Aug 24 '16 at 13:25
  • May be helpful -- I've encountered issues with bash scripts that use append and multiple processes. So I did a little testing: This seems unreliable: ( proc1 & proc2 & ) >> outputfile While this seems reliable: ( proc1 & proc2 & ) | cat >> outputfile Presumably stdio has optimizations for file i/o that have contention issues. – PaulC Jan 05 '19 at 05:47
  • @PaulC - You don't have by chance a reproducible example of such issues at hand, do you? – Armali Jan 07 '19 at 08:20
  • @Armali Well, with ">>" it works. With ">" it behaves differently on different fs types. Try this: echo 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | tr ' ' '\n' | while read i; do yes abcdefghijklmnopqrstuvwxyz | sed 1000q | sed -e "s/^/-$i- /" & done > tmpfile; wait; wc -l tmpfile – PaulC Jan 09 '19 at 02:30
  • @PaulC - I wouldn't call this _unreliable_ - The command line you used just introduces non-determinism by creating 50 processes which write to the >tmpfile potentially in parallel. Using >> changes just the open mode from O_TRUNC to O_APPEND, causing all 50 thousand lines to get into the file, but not necessarily in a specific order. By the way, the `wait` command is not effective, because the background pipeline contains children of a subshell. – Armali Jan 11 '19 at 09:01