5

I have many text files that I want to upload to a wiki running MediaWiki. I don't even know if this is really possible, but I want to give it a shot.

Each text file's name will be the title of the wiki page.

One wiki page for one file.

I want to upload all text files from the same folder as the program is in.

Perhaps asking you to code it all is asking too much, so could you tell me at least which language I should look for to give it a shot?

Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
user1849133
  • 527
  • 1
  • 7
  • 18

2 Answers2

5

What you probably want is a bot to create the articles for you using the MediaWiki API. Probably the best known bot framework is pywikipedia for Python, but there are API libraries and bot frameworks for many other languages too.

In fact, pywikipedia comes with a script called pagefromfile.py that does something pretty close to what you want. By default, it creates multiple pages from a single file, but if you know some Python, it shouldn't be too hard to change that.


Actually, if the files are on the same server your wiki runs on (or you can upload them there), then you don't even need a bot at all: there's a MediaWiki maintenance script called importTextFile.php that can do it for you. You can run it in for all files in a given directory with a simple shell script, e.g.:

for file in directory/*.txt; do
   php /path/to/your/mediawiki/maintenance/importTextFile.php "$file";
done

(Obviously, replace directory with the directory containing the text files and /path/to/your/mediawiki with the actual path of your MediaWiki installation.)

By default, importTextFile.php will base the name of the created page on the filename, stripping any directory prefixes and extensions. Also, per standard MediaWiki page naming rules, underscores will be replaced by spaces and the first letter will be capitalized (unless you've turned that off in your LocalSettings.php); thus, for example, the file directory/foo_bar.txt would be imported as the page "Foo bar". If you want finer control over the page naming, importTextFile.php also supports an explicit --title parameter. Or you could always copy the script and modify it yourself to change the page naming rules.


Ps. There's also another MediaWiki maintenance script called edit.php that does pretty much the same thing as importTextFile.php, except that it reads the page text from standard input and doesn't have the convenient default page naming rules of importTextFile.php. It can be quite handy for automated edits using Unix pipelines, though.


Addendum: The importTextFile.php script expects the file names and contents to be in the UTF-8 encoding. If your files are in some other encoding, you'll have to either fix them first or modify the script to do the conversion, e.g. using mb_convert_encoding().

In particular, the following modifications to the script ought to do it:

  1. To convert the file names to UTF-8, edit the titleFromFilename() function, near the bottom of the script, and replace its last line:

    return $parts[0];
    

    with:

    return mb_convert_encoding( $parts[0], "UTF-8", "your-encoding" );
    

    where your-encoding should be the character encoding used for your file names (or auto to attempt auto-detection).

  2. To also convert the contents of the files, make a similar change higher up, inside the main code of the script, replacing the line:

    $text = file_get_contents( $filename );
    

    with:

    $text = file_get_contents( $filename );
    $text = mb_convert_encoding( $text, "UTF-8", "your-encoding" );
    
Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • @llmari Karonen Thank you sooooooooooo much. I just want to figure out which method is the fastest one. Which one guarantees the fastest posting. And out there I saw many extensions such as MultiUpload, UploadLocal, UploadWizard. Can these be faster than the methods you mentioned? – user1849133 Aug 13 '13 at 16:22
  • @llmari Karonen And if there is indeed the fastest method, how fast is that? If I have 10000 txt files each with size 10KB, how fast will it be? I tested that my ftp uploads 1.2 such 10KB txt file per 1 second on average to my server. But how fast will it be to actually post those uploaded files to mediawiki? – user1849133 Aug 13 '13 at 16:25
  • 2
    That's a _really_ slow upload speed. Are you using an old modem from the 90's? As for importing the files into MediaWiki, certainly it should be faster to upload the files onto the server (perhaps in a .zip / .tar.gz archive to make it faster) and use importTextFile.php (or edit.php, which should be equally fast) than to use a bot. I suppose modifying the script to import all the files in one invocation would be even faster, but probably not enough to make up for the time it would take to make the changes and test them. – Ilmari Karonen Aug 13 '13 at 16:37
  • @llmari Karonen 1.2 file per 1 second is slow? My internet speed perhaps isn't that slow and I've used modem during 90s and surely its much faster than that :) So this time I tested with "single" 42684KB txt file and it took 2 minute 30 seconds to upload it by ftp. (Is this a slow thing, still?) So this is 284KB per second. This is a lot faster than 1.2 10KB file per second. Given this, I think the number of files makes the speed slow even if the total size is small. Hmm.. why is this so? I can I fix this? – user1849133 Aug 14 '13 at 03:56
  • 2
    @user2604484: That might well be the case. Using a better file transfer protocol (e.g. SFTP instead of FTP) might help, but the simplest solution is probably to put all your files in a single .zip archive ("compressed folder" in Windows speak) and upload that. You can uncompress it on the server with the `unzip` command. – Ilmari Karonen Aug 14 '13 at 08:49
  • @llmari_Karonen Thank you again. Now I wonder if I am bothering you, or if it might be better to create a new linked question separately. If so, please tell me. I will try to abide by general rules and culture of stackoverflow. And I will keep my question simple. By the way if the file name is "Foo bar" and I want to make a page titled "Template:Foo_bar" then how should I use --title to do this? – user1849133 Aug 14 '13 at 16:35
  • 2
    @user2604484: Yes, you could use `--title=Template:Foo_bar`, but that would work only for that one file. It might be easier to just rename the files to include the `Template:` prefix. Or edit the script to prepend it. – Ilmari Karonen Aug 14 '13 at 17:15
  • @llmari_Karonen Although I had some problem I solved the problem and posted my own answer to the new question at http://stackoverflow.com/questions/18247097/mediawiki-mass-upload-breaks-the-page I feel good that I could contribute something to stackoverflow community like you! :) But I got a problem that I couldn't solve by myself. If the file title is non-English like Korean, Chinese, Japanese, then the shell script says "Invalid title" I can I fix this? – user1849133 Aug 15 '13 at 06:09
  • 2
    @user2604484: You'll probably need to have the file names in UTF-8 too. – Ilmari Karonen Aug 15 '13 at 11:07
  • @llmari_Karonen Save filename as UTF-8? When I change the encoding of file content, I open the file in notepad and I do "file->save as->Encoding->choose one". But how can I change the encoding of file's title? I didn't even know file title also have something like encoding haha. – user1849133 Aug 15 '13 at 11:42
  • @llmari_Karonen I am sorry to ask you this after some time has passed. But I still haven't been able to solve this problem. Is there any way I can use this script when file title is non-English like Chinese and Japanese? – user1849133 Oct 13 '13 at 13:00
  • 1
    @user2604484: See the addendum to my answer above. – Ilmari Karonen Oct 13 '13 at 13:35
  • To ensure that encoding is UTF-8, better than `mb_convert_encoding` is sulution: `iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);` according to http://stackoverflow.com/questions/7979567/php-convert-any-string-to-utf-8-without-knowing-the-original-character-set-or - checked, and it's more accurate conversion, and no need to enter one static charset, it's auto-detection. – BlueMark Aug 21 '14 at 17:19
0

In MediaWiki 1.27, there is a new maintenance script, importTextFiles.php, which can do this. See https://www.mediawiki.org/wiki/Manual:ImportTextFiles.php for information. It improves on the old (now removed) importTextFile.php script in that it can handle file wildcards, so it allows the import of many text files at once.

Aaron
  • 1