Regex - match every possible char and space

Question

I want to extract data from html. The thing is, that i cant extract 2 of strings which are on the top, and on the bottom of my pattern.

I want to extract 23423423423 and 1234523453245 but only, if there is string Allan between:

                                        <h4><a href="/Profile/23423423423.html">@@@@@@</a>  </h4> said12:49:32
            </div>

                                <a href="javascript:void(0)" onclick="replyAnswer(@@@@@@@@@@,'GET','');" class="reportLink">
                    report                    </a>
                        </div>

        <div class="details">
                            <p class="content">


                       Hi there, Allan.



                                </p>

            <div id="AddAnswer1234523453245"></div>

Of course, i can do something like this: Profile\/(\d+).*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*Allan.*\s*.*\s*.*AddAnswer(\d+). But the code is horrible. Is there any solution to make it shorter?

I was thinking about:

Profile\/(\d+)(.\sAllan)*AddAnswer(\d+)

or

Profile\/(\d+)(.*Allan\s*)*AddAnswer(\d+)

but none of wchich works properly. Do you have any ideas?

Im using iMacros. But it is the same (I think) as Regex101.com — audiophonic, May 03 '16 at 15:00

score 2 · Answer 1 · answered May 03 '16 at 16:17

You can construct a character group to match any character including newlines by using [\S\s]. All space and non-space characters is all characters.

Then, your attempts were reasonably close

/Profile\/(\d+)[\S\s]*Allan[\S\s]*AddAnswer(\d+)/

This looks for the profile, the number that comes after it, any characters before Allan, any characters before AddAnswer, and the number that comes after it. If you have single-line mode available (/s) then you can use dots instead.

/Profile\/(\d+).*Allan.*AddAnswer(\d+)/s

demo

This won't work for multiple instances (it will only capture the last), see my answer for a better solution. — Jan, May 03 '16 at 16:32

score 0 · Answer 2 · answered May 03 '16 at 14:27

0

You can use m to specify . to match newlines.

/Profile\/(\d+).+AddAnswer(\d+)/m

answered May 03 '16 at 14:27

chifung7

2,531
1
19
17

3

Come on, this is utterly wrong - `s` is for single line mode, `m` is for multiline to match `^` and `$`. – Jan May 03 '16 at 14:32

score 0 · Answer 3 · edited May 23 '17 at 12:23

0

Better use a parser instead. If you must use regular expressions for whatever reason, you might get along with a tempered greedy solution:

Profile/(\d+)            # Profile followed by digits
(?:(?!Allan)[\S\s])+     # any character except when there's Allan ahead
Allan                    # Allan literally
(?:(?!AddAnswer)[\S\s])+ # same construct as above
AddAnswer(\d+)           # AddAnswer, followed by digits

See a demo on regex101.com

edited May 23 '17 at 12:23

Community

1
1

answered May 03 '16 at 16:31

Jan

42,290
8
54
79

I believe that regex with non-greedy matches like this might perform better: `/Profile\/(\d+)[\s\S]*?Allan[\s\S]*?(\d+)/g` Regex101 shows that 9110 steps are needed with your match pattern, while only 2740 steps are needed with this non-greedy one. – Petr Srníček May 03 '16 at 16:49

Regex - match every possible char and space

3 Answers3