Questions tagged [wikimedia-dumps]
48 questions
22
votes
2 answers
Multistream Wikipedia dump
I downloaded the german wikipedia dump dewiki-20151102-pages-articles-multistream.xml. My short question is: What does the 'multistream' mean in this case?

m4ri0
- 597
- 1
- 6
- 10
21
votes
2 answers
Empty list returned from ElementTree findall
I'm new to xml parsing and Python so bear with me. I'm using lxml to parse a wiki dump, but I just want for each page, its title and text.
For now I've got this:
from xml.etree import ElementTree as etree
def parser(file_name):
document =…

liloka
- 1,016
- 4
- 14
- 29
19
votes
9 answers
Parsing a Wikipedia dump
For example using this Wikipedia dump:
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=lebron%20james&rvprop=content&redirects=true&format=xmlfm
Is there an existing library for Python that I can use to create an array with the…

tomwu
- 397
- 1
- 3
- 11
6
votes
1 answer
Is there any way to get wikipedia pageview statistics per page at the *country* grain (instead of simply language)?
I see dumps.wikimedia.org/other/pagecounts-raw/, for example, but no country-specific data there...

Kevin Wylie
- 71
- 3
6
votes
0 answers
Getting Wikidata incremental triples
I would like to know if it is possible to get the latest incremental n-triple dumps of Wikidata.
I'm using Wikidata Toolkit to download the latest version of the dumps and convert them automatically in n-triple files (using…

Ortzi
- 363
- 1
- 6
- 17
3
votes
0 answers
How to get the cutoff timestamp or lastrevid for a given Wikidata JSON dump?
I am using Wikidata enriched with other data sources and I must ingest the entire Wikidata JSON dump in a dev graph database of mine.
That's easy and once that's done, I want to keep my copy updated by querying the RecentChanges and LogEvents API…

Lazhar
- 1,401
- 16
- 37
3
votes
2 answers
Wiktionary in Structured Format
How do I aqcuire a Wiktionary, for say English, in structured format, typically RDF?
The recommended website http://downloads.dbpedia.org/wiktionary/ is dead.
And I don't understand if there are some existing frameworks that extract an…

Nordlöw
- 11,838
- 10
- 52
- 99
3
votes
2 answers
Extracting Wikimedia pageview statistics
Wikipedia provides all their page views in a hourly text file. (See for instance http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-01/)
For a project is need to extract keywords and their associated page views for the year 2014. But seeing…

user3656702
- 31
- 4
3
votes
2 answers
R XML: How to retrieve a node with a given value
Here's a snippet of XML file I am using:
AccessibleComputing
0
10
381202555
381200179
…

arun kejariwal
- 31
- 4
2
votes
1 answer
Understanding wikimedia dumps
I'm trying to parse the latest wikisource dump. More specifically, I would like to get all the pages under the Category:Ballads page. For this purpose I downloaded the…

Gilad
- 538
- 5
- 16
2
votes
1 answer
Select rows based on information stored in separate table
First of all I'm sorry for the overly vague title, however I'm unfamiliar with the proper terminology for a problem like this.
I'm attempting to retrieve a list of page titles from Wiktionary (Wikimedia wiki-based dictionary) where the page must be…

Prime
- 2,410
- 1
- 20
- 35
2
votes
1 answer
Use a wikimedia image on my website
So I have a wikimedia commons URL(which is really just a wrapper for the actual image), like this:
https://commons.wikimedia.org/wiki/File:Nine_inch_nails_-_Staples_Center_-11-8-13(10755555065_16053de956_o).jpg
If I go to that page, I can see that…

dessalines
- 6,352
- 5
- 42
- 59
2
votes
3 answers
How to find old wikipedia dumps
I need to access to very old wikipedia dumps (backups of Wikipedia) in french. I succeed in finding a 2010 backup from archive.org, and now i'm searching for 2006 or even before.
I know that in the latest dumps there is all the data from previous…

Léo Joubert
- 522
- 4
- 17
2
votes
1 answer
Parse XML dump of a MediaWiki wiki
I am trying to parse an XML dump of the Wiktionary but probably I am missing something since I don't get anything as output.
This is a similar but much shorter xml file:

CptNemo
- 6,455
- 16
- 58
- 107
2
votes
1 answer
wiki dump encoding
I'm using WikiPrep to process the latest wiki dump enwiki-20121101-pages-articles.xml.bz2. Instead of "use Parse::MediaWikiDump;" I replaced that by "use MediaWiki::DumpFile::Compat;" and did the proper changes in the code. Then, I ran
perl…

xuan
- 270
- 1
- 2
- 15