0

At least, a preg_match_all question not strictly about regular expressions!

It's for a Word Press Plugin, I'm extending a plugin which replaces Word Footnotes with Wordpress Footnotes. Now I also want to replace some Word Form Codes (if you're interested: I'll replace Zotero Fields from Word with shortcodes from Zotpress). The string of such a field can look like this:

{ ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":"klJTkj1B","properties":{"formattedCitation":"{\rtf Vivia Sequera, \uc0\u8220{}Did Attempted Rape Ignite Venezuela\uc0\u8217{}s National Protests?,\uc0\u8221{} \i Christian Science Monitor\i0{}, February 22, 2014, URL}","plainCitation":"Vivia Sequera, “Did Attempted Rape Ignite Venezuela’s National Protests?,” Christian Science Monitor, February 22, 2014, URL"},"citationItems":[{"id":1080,"uris":["http://zotero.org/groups/228165/items/U8EBSIQM"],"uri":["URL"],"itemData":{"id":1080,"type":"article-magazine","title":"Did attempted rape ignite Venezuela's national protests?","container-title":"Christian Science Monitor","source":"Christian Science Monitor","abstract":"Student protests began at the University of the Andes in San Cristobal, Venezuela, after an attempted rape of a college woman. A week later, the protests boiled over into a violent national uprising.","URL":"URL","ISSN":"0882-7729","author":[{"family":"Sequera","given":"Vivia"}],"issued":{"date-parts":[["2014",2,22]]},"accessed":{"date-parts":[["2014",4,7]]}}}],"schema":"URL"} }

I need to convert strings like these into (I made it bold here):

[zotpressInText item="{U8EBSIQM}"]

Note that these are inside the post. I've created the expression already here at regex101(http://regex101.com/r/jK0lU1). But I got another weird problem, so let's keep it simple. To find the beginning of the string would be:

/\{\s*ADDIN\sZOTERO_ITEM/

But I wasted 4-5 hours now because of the following. If I try this:

$pattern2 = '/{\s*ADDIN ZOTERO_ITEM/'; $content2 = '[1] { ADDIN ZOTERO_ITEM CSL_CITATION {"citationID":".... then the link ... org/groups/228165/items/U8EBSIQM"]," ... csl-citation.json"} }'; preg_match_all( $pattern2, $content, $zotfields, PREG_SET_ORDER); print_r($zotfields);

This works. The content I put into the $content2 variable I got by printing the real $content variable and copying it manually. But if I use &content directly, it doesn't work.

The main difference is that $content DOES HAVE multiple lines. So, maybe there's the problem. But the RegEx modifiers m or s didn't help either.

Any ideas what I could try next?

Marc Chéhab
  • 101
  • 5
  • 1
    If the content is in valid JSON format, then just use `json_decode()` and forget about regex. – HamZa Apr 09 '14 at 16:24
  • Sorry, it is not in JSON format. To be clear: The $content variable contains the entire blog post! :) So before and after the relevant stuff there's the text of the post which I can't delete! – Marc Chéhab Apr 09 '14 at 16:54

1 Answers1

0

I got it! I echoed both variables as ASCII characters and realised there's a THIRD INVISIBLE CHARACTER " " in the one variable. This is somewhere an encoding error between UTF8 and iso8859_1, which is very well explained here

Community
  • 1
  • 1
Marc Chéhab
  • 101
  • 5