1

How can I draw only the words in [[words]] into array?

[[旭川市|旭川]](文化) - [[アイヌ]]文化、[[旭川市旭山動物園|旭山動物園]]など

I tried \[\[.*]] but it didn't work, maybe it is because .* is only for English strings..

Makoto
  • 104,088
  • 27
  • 192
  • 230
bbnn
  • 3,505
  • 10
  • 50
  • 68
  • Make sure you are loading your text files in the same character encoding they were saved in. If it was saved in Shift-JIS and you try to load it as UTF-8, the string will be effectively unparsable. – Andrew Nov 13 '10 at 06:53
  • Have you tried using a [MediaWiki parser](http://stackoverflow.com/questions/324758/open-source-parser-code-for-mediawiki-markup) instead of a Regular Expression? – Gordon Nov 13 '10 at 11:28

4 Answers4

2
preg_match_all('/\[\[(.+?)\]\]/u',$str,$matches);
var_dump($matches);
bcosca
  • 17,371
  • 5
  • 40
  • 51
0

You need to backslash both sides, all the square brackets need to be escaped.

This worked in Python, may need modification for PHP:


>>> re.compile('\[\[(.*?)\]\]')
<_sre.SRE_Pattern object at 0xb747ebf0>
>>> r=_
>>> r.search(text)
<_sre.SRE_Match object at 0xb7469560>
>>> r.findall(text)
['\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82|\xe6\x97\xad\xe5\xb7\x9d', '\xe3\x82\xa2\xe3\x82\xa4\xe3\x83\x8c', '\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92|\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92']

Hmm, maybe I'm wrong about having to escape the right-square brackets, turned out it wasn't necessary in Python.

jcomeau_ictx
  • 37,688
  • 6
  • 92
  • 107
0

You can encode the Unicode first:

[&#26093;&#24029;&#24066;&#26093;&#23665;&#21205;&#29289;&#22290;&#124;&#26093;&#23665;&#21205;&#29289;&#22290;&#93;&#93;&#12394;&#12393l]
Brettski
  • 19,351
  • 15
  • 74
  • 97
0

One problem is that you're using the greedy wildcard: \[\[.*]] will match from the first [[ to the last ]], including any intervening ]].

Most regex engines now also include a nongreedy wildcard, typically *? so \[\[.*?]] would just match one wikilink at a time.

hippietrail
  • 15,848
  • 18
  • 99
  • 158