1

The situation is, I have a private wiki, say at http://mysite.com/wiki, which is behind a password. What I'd like to do, is have a separate location on the same server, that could read arbitrary text files with wiki text (code), and use the particular engine of http://mysite.com/wiki to render HTML from it (because of installed templates/plugins).

As example, I would have a /tmppub directory on http://mysite.com; and in it, I'd have a text file with wiki text source code in it, say Example.wiki, and a process.php page; then I'd call:

http://mysite.com/tmppub/process.php?file=Example.wiki

... where process.php would read the file Example.wiki in the same directory, pass the contents somehow to the ../wiki installation, and retrieve the HTML output and display it.

I guess, what I want is similar to the example in Mediawiki2HTML - gwtwiki - How to convert Mediawiki text to HTML - Java Wikipedia API (Bliki engine) - except this Mediawiki2HTML is in Java (I'd want PHP) and possibly uses internal rendering engine (I'd want an already existing specific installation of Mediawiki).

The thing is, I can cook me up a PHP script which will read the file, handle the password of /wiki, and pass GET and POST variables - except I'm not sure how I would address the Mediawiki installation:

  • I could try to fake a call to &action=edit (e.g. Editing Wikipedia:Sandbox) and ask for a preview; but that would return the edit buttons and text fields, which I'd have to manually clean - no like
  • I could try to address the API, but as I can see in API:Parsing wikitext - MediaWiki, it will only work with pages already in the Mediawiki installation - not with pages off of it.

Finally, I'd like to obtain just the raw HTML of the content (without HTML for sidebars and such), as when using action parameter render (example).

 

Does anyone one if there is already such a PHP application available - and if not, what would be the proper way to address the Mediawiki installation, to obtain a 'raw' HTML rendering of the wiki text source?

Thanks in advance for any answers,
Cheers!

sdaau
  • 36,975
  • 46
  • 198
  • 278

3 Answers3

2

You can actually use the API even to parse custom wikitext using the parse action. (The title parameter is maybe a bit misleading, but it's really just a pointer for the parser when using, for example, {{PAGENAME}}.) To parse existing page, the render action is used.

If the authentication is HTTP-based and you have access to the MediaWiki installation, you can abuse the code that is used for maintenance scripts to load the important stuff and parse on top of that. (This is maybe a little dirty, though.) The following code is taken from includes/api/ApiParse.php and edited a little (of course, adjust the file path to your needs):

require_once dirname( __FILE__ ) . '/w/maintenance/commandLine.inc';

$text = "* [[foo]]\n* [[Example|bar]]\n* [http://example.com/ an outside link]";
$titleObj = Title::newFromText( 'Example' );
$parserOptions = new ParserOptions();
$parserOptions->setTidy( true );

$parserOutput = $wgParser->parse( $text, $titleObj, $parserOptions );
$parsedText = $parserOutput->getText();

The parsed HTML is now in the $parsedText variable. If you need to perform pre-save transform on the text (expand {{subst:}}s, tildes to signatures, etc.), take a look to the ApiParse.php file for reference.

Matěj G.
  • 3,036
  • 1
  • 26
  • 27
  • Awesome, thanks a ton, @Matěj Grabovský - it was very difficult for me to find working examples related to the MediaWiki API! I also had some trouble running your code, which I'll document in the next answer post... Thanks again - cheers! – sdaau Jul 17 '11 at 01:42
0

There are many wiki parsers available - http://www.mediawiki.org/wiki/Alternative_parsers

You can choose any one of them. All you need to do is put a simple authentication wrapper around them and you could then use it as a service.

Sukumar
  • 3,502
  • 3
  • 26
  • 29
  • Hi @Sukumar, thanks for that - however, your link states: "... that is, programs and projects, *other than MediaWiki itself*, which are able or intended to translate MediaWiki's text markup syntax into something else."; and I had already stated I want to use the *specific* engine I have installed (because of installed templates, etc - will edit post). Cheers!! – sdaau Jul 15 '11 at 11:15
  • Even if this doesn't answer the question, it's my go-to solution too. I rely on mediawiki syntax for my personal txt files but use Pandoc, not the php executables with the actual Mediawiki distro. – Sridhar Sarnobat May 05 '22 at 22:44
0

Thanks @Matěj Grabovský for the answer; however, I tripped a couple of times while I got it to work, so here's a writeup.

First of all, I just saved the code from the answer as mwparse.php, and tried to call it from a web browser - the answer: "This script must be run from the command line". Ah well :) This turns out to be a requirement for using commandLine.inc.

So, I log in to the server shell, and I try to execute from CLI, and I get:

$ cd /path/to/mwparse/
$ php -f mwparse.php
...
Exception caught inside exception handler: exception 'DBQueryError' with message 'A database error has occurred
Query: SELECT /* MessageCache::loadFromDB 127.0.0.1 * /  page_title  FROM MWPREFIX_page  WHERE page_is_redirect = '0' AND page_namespace = '8' AND (page_title not like '%%/%%') AND (page_len > 10000)
Function: doQuery
Error: HY000 no such table: MWPREFIX_page
' in /path/to/MyWiki/includes/db/Database.php:606
Stack trace:
....

... which is bullcrap, since the MyWiki installation works when called from a browser - and I also opened the database in sqlitebrowser to confirm that, indeed, the table MWPREFIX_page exists. (what is refered to /w in Matěj's answer I call /MyWiki here)

So after an attempt to install xdebug and debug the script using that (which failed to work with Mediawiki for me, seemingly because memory kept getting exhausted), I simply tried to run this command:

php -r "require_once dirname( __FILE__ ) . 'PREFIX/maintenance/commandLine.inc';"

... in different directories, with appropriate PREFIX. Turns out, it is only possible to execute this line in the root Mediawiki installation - that is, in this case, in the MyWiki folder:

$ cd /path/to/MyWiki
$ php -r "require_once dirname( __FILE__ ) . '/maintenance/commandLine.inc';"
$

Knowing this, I modified Matěj's script into:

<?
//~ error_reporting(E_ALL);
//~ ini_set('display_errors', '1');

chdir('../MyWiki);
//echo getcwd() . "\n"; // for debug check

require_once './maintenance/commandLine.inc';

$text = "* [[foo]]\n* [[Example|bar]]\n* [http://example.com/ an outside link]";

$titleObj = Title::newFromText( 'Example' );
$parserOptions = new ParserOptions();
$parserOptions->setTidy( true );

$parserOutput = $wgParser->parse( $text, $titleObj, $parserOptions );
$parsedText = $parserOutput->getText();

echo $parsedText;
?>

Now I can run the script from its own directory; however, the following:

PHP Notice:  Undefined index: SERVER_NAME in /path/to/MyWiki/includes/Linker.php on line 888
Notice: Undefined index: SERVER_NAME in /path/to/MyWiki/includes/Linker.php on line 888

... can be seen in the output. The Notice is if error_reporting is enabled - the PHP Notice is actually in stderr. Thus, to get just the output from the script, in the script's directory I would call:

php -f mwparse.php 2>/dev/null

To get this online, now I'd just have to write a PHP page which calls this script in CLI (possibly using exec), which shouldn't be a problem (except that the require_once ... commandLine.inc does take a couple of seconds to execute, so it will be somewhat of a performance hit).

Well, glad to see this solved - thanks again,
Cheers!

 

PS: As I spent quite some time on that, I will be dumping somewhat of a command line log (mostly related to installation of xdebug) below.

from web: This script must be run from the command line

from remote terminal:

Exception caught inside exception handler: exception 'DBQueryError' with message 'A database error has occurred
Query: SELECT /* MessageCache::loadFromDB 127.0.0.1 * /  page_title  FROM MWPREFIX_page  WHERE page_is_redirect = '0' AND page_namespace = '8' AND (page_title not like '%%/%%') AND (page_len > 10000)
Function: doQuery
Error: HY000 no such table: MWPREFIX_page
' in /path/to/MyWiki/includes/db/Database.php:606
Stack trace:
....

PHP Deprecated:  Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/mcrypt.ini on line 1 in Unknown on line 0
sdf

MediaWiki internal error.

Original exception: exception 'DBQueryError' with message 'A database error has occurred
Query: SELECT /* MediaWikiBagOStuff::_doquery 127.0.0.1 * / value,exptime FROM PREFIX_objectcache WHERE keyname='wikidb-MWPREFIX_:messages:en'
Function: doQuery
Error: HY000 no such table: MWPREFIX_objectcache
' in /path/to/MyWiki/includes/db/Database.php:606

http://www.apaddedcell.com/easy-php-debugging-ubuntu-using-xdebug-and-vim
https://stackoverflow.com/questions/1947395/how-can-i-debug-a-php-cli-script-with-xdebug

sudo apt-get install php-pear # pecl
sudo pecl install xdebug-beta # sh: phpize: not found
sudo apt-get install php5-dev # phpize; The following extra packages will be installed:   autoconf automake autotools-dev binutils gcc gcc-4.4 libc-dev-bin libc6-dev   libltdl-dev libssl-dev libtool linux-libc-dev m4 manpages-dev shtool   zlib1g-dev
sudo pecl install xdebug-beta # Installing '/usr/lib/php5/20090626+lfs/xdebug.so'

sudo nano /etc/php5/apache2/php.ini # zend_extension=/usr/lib/php5/20090626+lfs/xdebug.so and paste

sudo service apache2 restart # sudo /etc/init.d/apache2 restart

wget http://xdebug.org/files/xdebug-2.1.1.tgz # for debugclient
tar xzvf xdebug-2.1.1.tgz
rm package*.xml

cd xdebug-2.1.1/
$ cd debugclient
$ ./configure --with-libedit # configure: error: "libedit was not found on your system."
sudo apt-get install libedit2 # libedit2 is already the newest version.
sudo apt-get install libedit-dev # The following extra packages will be installed:   libbsd-dev libncurses5-dev
$ ./configure --with-libedit
$ make
# make install
./debugclient # Waiting for debug server to connect.

# open another remote terminal
export XDEBUG_CONFIG="idekey=session_name"
php mwparse.php
# flies by

# mediawiki started crashing upon adding ?XDEBUG_SESSION_START=1 to url, restart server

# now different errors:
# Deprecated: Call-time pass-by-reference has been deprecated in /path/to/MyWiki/includes/Article.php on line 1658 (http://www.emmajane.net/php-what-call-time-pass-reference-story)
# Notice: Undefined variable: wgBibPath in /path/to/MyWiki/extensions/Bibwiki/Bibwiki.i18n.php on line 116
# Fatal error: Allowed memory size of 20971520 bytes exhausted (tried to allocate 16 bytes) in /path/to/MyWiki/includes/GlobalFunctions.php on line 337

http://www.mediawiki.org/wiki/Manual:Errors_and_symptoms#Fatal_error:_Allowed_memory_size_of_nnnnnnn_bytes_exhausted_.28tried_to_allocate_nnnnnnnn_bytes.29

sudo nano /etc/php5/apache2/php.ini # comment out xdebug stuff
sudo service apache2 restart # now mediawiki works fine...

 

EDIT notes:

  • Note that even if you set $wgDefaultUserOptions ['editsection'] = false; in your LocalSettings.php, that has no effect on the above script (although it will have effect in Mediawiki proper) - if you want to disable edit section list for the API script rendering, the script must contain $parserOptions->setEditSection( false ); (this being set through MediaWiki: ParserOptions Class)
  • Since on production server, it seems I have no permission to run PHP: exec() (or rather, PHP: passthru()), or maybe no permission to run php-cli - so I cannot use the above solution verbatim, because commandLine.inc will demand a terminal. However, its possible to make a copy of commandLine.inc, and 'hack' it with $argv = array();unset($_SERVER);, and then the above parser may work fully from a webserver context (however, I'm not sure if this copying of commandLine.inc may represent a security risk?)
Community
  • 1
  • 1
sdaau
  • 36,975
  • 46
  • 198
  • 278