2

Before all: My english is not that good, so... I'd like to ask for apologizes if you guys can't understand me :)

So, this is what I'm looking for: I'm being using a Wordpress plugin to generate XML (WP ALL EXPORT). Good.

Now, I need to open a file and edit some stuffs. I started with:

$data = file_get_contents("1439828483.xml");

And now I'm working using str_replace and preg_replace to update the lines I need.

I have two XML tag like these:

<cidade><![CDATA[sao-paulo>santo-andre]]></cidade>
<bairro><![CDATA[sao-paulo>santo-andre]]></bairro>

You see the content is the same... but it's because I have one ">" character splitting 2 stuff.

In the <cidade></cidade> tag I need to keep only what is before ">". In the <bairro></bairro> tag I need to keep only what is after ">".

For the second problem, I fixed using this:

$data = preg_replace('#(<bairro>).*?(>)#', '$1$2', $data);
$data = str_replace('<bairro>>', "<bairro><![CDATA[",$data);

The result is:

<bairro><![CDATA[santo-andre]]></bairro>

OK, I have the content but it still have hyphens (dashes) and now I'm not able to fix it (No idea how to). What I really need is:

<bairro><![CDATA[santo andre]]></bairro>

And of course, for the tag <cidade></cidade> I would need to have:

<cidade><![CDATA[sao paulo]]></cidade>

Before posting here, I found this topic: Regex between, from the last to specific end

But I tried to edit some parts of anubhava and Jack Maney answers but I failed :(

As I'm using preg_replace and str_replace I don't know if there is some limitations for regex strings.

Thanks and I hope you guys can understand me :D

Community
  • 1
  • 1
Diego
  • 145
  • 4
  • 14
  • It sounds like you're trying to parse XML with regex. You might be able to get away with it for XML this simple, but it's [usually not a good idea](http://stackoverflow.com/a/1732454/399649). It looks like [PHP already has an XML parser](http://php.net/manual/en/book.xml.php), so you might find that more useful. – Justin Morgan - On strike Aug 18 '15 at 19:00
  • Probably it's true... but since I could fix 90% of it using this technique, I thought I could be able to finish it.... I mean.... this is the last problem :P And... this seems to be able to be fixed. It's only a case of regex and preg_replace. Not a big deal for people that know how those 2 stuff work (not my case ^^). – Diego Aug 18 '15 at 19:12

2 Answers2

1

This will do it (and replaces your own fix):

$data = preg_replace('#(<bairro><!\[CDATA\[)[^>]*?>([^>]*?><)#', '$1$2', $data);
while(preg_match('#(<bairro>[^->]*?)-([^->]*?-)*([^->]*?'.'>)#', $data)) {
    $data = preg_replace('#(<bairro>[^->]*?)-(([^->]*?-)*)([^->]*?'.'>)#', '$1 $2$4', $data);
}
$data = preg_replace('#(<cidade><!\[CDATA\[[^>]*?)>[^>]*?(\]\]><)#', '$1$2', $data);
while(preg_match('#(<cidade>[^->]*?)-([^->]*?-)*([^->]*?'.'>)#', $data)) {
    $data = preg_replace('#(<cidade>[^->]*?)-(([^->]*?-)*)([^->]*?'.'>)#', '$1 $2$4', $data);
}
hellcode
  • 2,678
  • 1
  • 17
  • 21
  • Man, you're the best :P But it didn't remove the dashes from the first tag. – Diego Aug 18 '15 at 19:44
  • You mean the hyphen inside of cidade? It removes it when there is one hyphen. Do you have another example? – hellcode Aug 18 '15 at 19:48
  • Oh yeah! It has more than one hyphen. rio-de-janeiro sao-jose-do-campo-grande It can have a lot of hyphens. Even the bairro tag can have. Is there a solution to remove ALL hyphens? – Diego Aug 18 '15 at 19:49
0

Let me just point out that parsing XML with regex is often a bad idea, partly for reasons you're discovering. However, if all you want to do is replace hyphens with spaces, just do this:

$data = str_replace_all('-', " ", $data);

This will replace ALL the hyphens in your input, of course, so make sure you know what's in there.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • But this is a XML with lots of information. If I replace ALL hyphens, I will destroy good info that need this "-". I need to replace only dashes from a selected tag. – Diego Aug 18 '15 at 19:19