I downloaded the german wikipedia dump dewiki-20151102-pages-articles-multistream.xml. My short question is: What does the 'multistream' mean in this case?
Asked
Active
Viewed 6,133 times
2 Answers
26
The dumps are compressed using bz2, bz2 support a parallel version allowing it to compress/decompress files faster .
Compressed data using the parallel version is tagged as multistream
.
Knowing this information makes a difference when you are processing the dump from a programming language, since you have to pass a flag to tell the library how to uncompress it (parallel or non parallel).

David Przybilla
- 830
- 6
- 16
-
Could you please answer this question: https://stackoverflow.com/questions/48386791/extract-related-articles-in-different-languages-using-wikidata-toolkit?noredirect=1#comment84061677_48386791 – SahelSoft Feb 04 '18 at 15:12
5
multistream
allows the use of an index to decompress sections as needed without having to decompress the entire thing.
This allows a reader to pull articles out of a compressed dump.

RobC
- 502
- 4
- 17