simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

Question

thanks by your help.

my target is use preg_replace + pattern for remove very sample strings.

then only using preg_replace in this string or others, I need remove ANY content into <tag and next symbol >, the pattern is so simple, then:

$x = '@<\w+(\s+[^>]*)>@is';
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
preg_match_all($x, $s, $Q);
print_r($Q[1]);

[1] => Array
    (
        [0] =>  class="td1"
        [1] =>  class="td2"
    )

work greath!

now I try remove strings using the same pattern:

$new_string = '';
$Q = preg_replace($x, "\\1$new_string", $s);
print_r($Q);

result is completely different.

what is bad in my use of preg_replace?

using only preg_replace() how I can remove this strings?

(we can use foreach(...) for remove each string, but where is the error in my code?)

my result expected when I intro this value:

$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';

is this output:

$Q = 'DATA<td>111</td><td>222</td>DATA';

Welcome to Stack Overflow. Please take the [tour] to learn how Stack Overflow works and read [ask] on how to improve the quality of your question. It is unclear what you are asking or what the problem is. Please [edit] your question to include a description what you don't expect from the return value of `preg_replace` or what string you expect. — Progman, Sep 26 '22 at 19:18
Does this answer your question? [Remove all attributes from html tags](https://stackoverflow.com/questions/3026096/remove-all-attributes-from-html-tags) — Chris Haas, Sep 26 '22 at 19:29
thabks @Progman, by your recomendation I edit and now add `what I expect` :-) — Yamile, Sep 26 '22 at 19:29
thanks @{Chris Haas} my target is use only `preg_replace` (understand where is my error) — Yamile, Sep 26 '22 at 19:33
@Yamile, that answer uses `preg_replace` only, and they broke their regex down with great comments. — Chris Haas, Sep 26 '22 at 19:35
@Progman I use `\1` becouse my pattern use `()` this means the REGEX return [0] for complet pettern and [1] for first parenthesis — Yamile, Sep 26 '22 at 19:36
@Yamile I know what `\1` is, but why do you have `\1`, which has the content of the attributes, in your replacement string, when you want to get rid of the attributes? — Progman, Sep 26 '22 at 19:38
@{Chris Haas} I use preg_math_ALL for "confirm" my pattern is ok, but really I my target is use `preg_replace` — Yamile, Sep 26 '22 at 19:38
@Progman my focus not is get "ids", my focus is obtain: `DATA111222DATA` — Yamile, Sep 26 '22 at 19:39
@Yamile Your `\1` contains all the attributes from the HTML tag, why do you want them added to the replacement text when you actually want to remove them? — Progman, Sep 26 '22 at 19:43
thanks, surely I am wrong, then sayme how I remove this using `preg_replace` — Yamile, Sep 26 '22 at 19:45
In your `preg_match` you are only inspecting `$Q[1]` but you should also inspect `$Q[0]` which [holds the entire result](https://3v4l.org/ahpjW). When you use `preg_replace` it works on the whole result, so it is up to you to transform that. It might help to use the callback version to visualize this: https://3v4l.org/pjp4Q — Chris Haas, Sep 26 '22 at 19:57
@{Chris Haas} from `preg_match_ALL` you can see the `string object` is into `$Q[1]` by this I use `\1` and no `\0`. Then: how replace the list of `$Q[1]` using `preg_replace` ? — Yamile, Sep 26 '22 at 20:09
Yes, _match_ shows you the entire match which is in `[0]` and any sub-matches. When you use match, because you aren't changing anything, you can _choose_ to only look at `[1]` if you want. When you use _replace_, whatever is in `[0]` is what PHP is asking you "what do you want to replace this **whole thing** with". You can still access `[1]` for replacement logic on `[0]`, but whatever you return, `[0]` will be replaced with. To offset that, you sometimes want to [invert the capture logic](https://stackoverflow.com/a/4898239/231316). — Chris Haas, Sep 26 '22 at 20:53
thanks @{Chris Haas}, sorry but really I not see how to change `[0]` by `$new_string`. When I use `$Q = preg_replace($x, '\0', $s);` nothing change. Then: 1// is fine the pattern? 2// what is wrong in my code? — Yamile, Sep 26 '22 at 21:21
Parsing / Manipulating HTML with RegEx is a really bad idea, you should use something else if you can. — Marco, Sep 27 '22 at 14:05

score 2 · Answer 1 · answered Sep 27 '22 at 14:02

Let's break down your RegEx, @<\w+(\s+[^>]*)>@is, and see if that helps.

@          // Start delimiter
<          // Literal `<` character
\w+        // One or more word-characters, a-z, A-Z, 0-9 or _
(          // Start capturing group
  \s+      // One or more spaces
  [^>]*    // Zero or more characters that are not the literal `>`
)          // End capturing group
>          // Literal `>` character
@          // End delimiter
is         // Ignore case and `.` matches all characters including newline

Given the input DATA<td class="td1">DATA this matches <td class="td1"> and captures class="td1". The difference between match and capture is very important.

When you use preg_match you'll see the entire match at index 0, and any subsequent captures at incrementing indexes.

When you use preg_replace the entire match will be replaced. You can use the captures, if you so choose, but you are replacing the match.

I'm going to say that again: whatever you pass as the replacement string will replace the entirety of the found match. If you say $1 or \\=1, you are saying replace the entire match with just the capture.

Going back to the sample after the breakdown, using $1 is the equivalent of calling:

str_replace('<td class="td1">', ' class="td1"', $string);

which you can see here: https://3v4l.org/ZkPFb

To your question "how to change [0] by $new_string", you are doing it correctly, it is your RegEx itself that is wrong. To do what you are trying to do, your pattern must capture the tag itself so that you can say "replace the HTML tag with all of the attributes with just the tag".

As one of my comments noted, this is where you'd invert the capturing. You aren't interesting in capturing the attributes, you are throwing those away. Instead, you are interested in capturing the tag itself:

$string = 'DATA<td class="td1">DATA';
$pattern = '@<(\w+)\s+[^>]*>@is';

echo preg_replace($pattern, '<$1>', $string);

Demo: https://3v4l.org/oIW7d

oh Master @{Chris Haas} your solution is so big!... Run perfectlly!, many thanks, now I need study your `answer/code/lesson/solution` ; many thanks by your time and help for I understand how to work **preg_replace** :-) — Yamile, Sep 28 '22 at 00:57

simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

1 Answers1