Regex Match all characters between two strings

Question

Example: This is just\na simple sentence.

I want to match every character between This is and sentence. Line breaks should be ignored. I can't figure out the correct syntax.

stema · Accepted Answer · 2011-05-24T20:25:11.557

1028

For example

(?<=This is)(.*)(?=sentence)

Regexr

I used lookbehind (?<=) and look ahead (?=) so that "This is" and "sentence" is not included in the match, but this is up to your use case, you can also simply write This is(.*)sentence.

The important thing here is that you activate the "dotall" mode of your regex engine, so that the . is matching the newline. But how you do this depends on your regex engine.

The next thing is if you use .* or .*?. The first one is greedy and will match till the last "sentence" in your string, the second one is lazy and will match till the next "sentence" in your string.

Update

Regexr

This is(?s)(.*)sentence

Where the (?s) turns on the dotall modifier, making the . matching the newline characters.

Update 2:

(?<=is \()(.*?)(?=\s*\))

is matching your example "This is (a simple) sentence". See here on Regexr

edited May 24 '11 at 20:25

answered May 24 '11 at 12:05

stema

90,351
20
107
135

1

@tchrist, sorry I had to look this up. Do I understand this correct and `This is(?s)(.*)sentence` would be working? – stema May 24 '11 at 12:20
@stema: Yes, that should work to enable "dot all" mode under most regex libraries. – tchrist May 24 '11 at 12:21
1

That mostly solved my problem, but how do I include a white space character in my pattern? I tried the following: "(.*?)( \))" to match the " )" at the end of a sequence, but it didn't work. – 0xbadf00d May 24 '11 at 14:09
@FrEEzE2046 I don't understand, which `)`? If you want to match a bracket you have to escape it, `\)`will match a single closing bracket. – stema May 24 '11 at 14:14
I'm sorry, what I really want is this: "(.*?=)\(" - The first ending "(" (opening bracket) of a sequence. The bracket itself shouldn't be included in the match. If it's possible I would also like to ignore each white space character between the bracket and the rest of the match. – 0xbadf00d May 24 '11 at 14:22
@FrEEzE2046 Is `(.*?)\s*\(` what you mean? `\s` is one whitespace `*` means at least 0 times and `\(` an opening bracket. – stema May 24 '11 at 14:27
It shouldn't be part of the match. Didn't you write I need to use ?= to do that? Example string: "This is (a simple) sentence". I want to mach "is". I'm able to exclude "This", but I also need to exclude the rest. Currently I'm getting " is (". I'm using the following Regex: @"(?<=This)(?s)(.*?)\s*\(" – 0xbadf00d May 24 '11 at 14:31
I've updated my comment. EDIT: Your last expression did the job. Now, I need to get rid of the remaining white space character between "This" and "(" ... but there could be more than one (I want to exclude all of them. – 0xbadf00d May 24 '11 at 14:34
37

Just one note - regexr says now that lookbehind is not supported in javascript – Kovo Apr 14 '14 at 10:53
I want to find including line break and carriage returns ... how to do this..? – Mohasin Ali Apr 23 '14 at 13:26
@MohasinAli it's a command switch. different for each language implementations. see your lang notes for details. – Keng May 20 '14 at 12:25
I can be wrong but, .* won't work if there is a new line it doesn't include new lines. At least in JS https://developer.mozilla.org/ru/docs/Web/JavaScript/Guide/Regular_Expressions#special-carriage-return – Rantiev Jan 28 '16 at 10:24
@Rantiev, you are correct, by default '.' does not match newline characters, but you can change that by using the dotall or singleline mode I mention in my answer. In most languages. But when I remember correctly, you are again correct, not in JavaScript. – stema Jan 28 '16 at 10:41
@stema Yep, it works with flag "s" singleline mode /.*/s in js as well. – Rantiev Jan 29 '16 at 17:13
This answer looks obsolete. Links mentioned have errors. – Manohar Reddy Poreddy Apr 15 '18 at 11:10
5

Is there a way to deal with repeated instances of this split in a block of text? FOr instance: "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence. ". Currently it matches the entire string, rather than each instance. – jzadra Jul 06 '18 at 13:47

zx81 · Answer 2 · 2014-08-09T02:03:35.367

279

Lazy Quantifier Needed

Resurrecting this question because the regex in the accepted answer doesn't seem quite correct to me. Why? Because

(?<=This is)(.*)(?=sentence)

will match my first sentence. This is my second in This is my first sentence. This is my second sentence.

See demo.

You need a lazy quantifier between the two lookarounds. Adding a ? makes the star lazy.

This matches what you want:

(?<=This is).*?(?=sentence)

See demo. I removed the capture group, which was not needed.

DOTALL Mode to Match Across Line Breaks

Note that in the demo the "dot matches line breaks mode" (a.k.a.) dot-all is set (see how to turn on DOTALL in various languages). In many regex flavors, you can set it with the online modifier (?s), turning the expression into:

(?s)(?<=This is).*?(?=sentence)

Reference

edited Aug 09 '14 at 02:03

answered May 20 '14 at 09:41

zx81

41,100
9
89
105

1

You are correct about the capturing group. Don't know why I have done this. But the difference between `.*` and `.*?` is also explained in my answer (the paragraph before "Update"). So I don't think my answer is incorrect. – stema May 20 '14 at 12:28
3

@stema Sorry about the nitpicking, while cruising through some of your answers yesterday that is the only one that made me twitch. :) I softened the first line from `is incorrect` to `doesn't seem quite correct to me`... Hope that doesn't make *you* twitch, probably just a difference of perception about what the regex for such a high-traffic answer should be. – zx81 May 20 '14 at 20:20
@zx81 (?<=exclude this).*?(?=and exclude this) this is a fantastic answer thank you. I thought I understand regex but this made me do another read on the subject :) everyday is a learning day :) – AD Progress Nov 30 '22 at 20:02

score 75 · Answer 3 · edited Mar 22 '21 at 14:37

75

Try This is[\s\S]*?sentence, works in javascript

edited Mar 22 '21 at 14:37

Wiktor Stribiżew

607,720
39
448
563

answered Sep 21 '11 at 18:36

kaore

1,288
9
14

how to perform a lazy lookup in this way? – AGamePlayer Nov 03 '15 at 16:52
5

@AwQiruiGuo same as above. `[\s\S]*?` (also called: non-greedy wildcard) – phil294 Apr 09 '16 at 16:54

score 25 · Answer 4 · answered Apr 09 '16 at 16:49

25

This:

This is (.*?) sentence

works in javascript.

answered Apr 09 '16 at 16:49

Riyafa Abdul Hameed

7,417
6
40
55

I like the simplicity, but it was not sufficient for me. What I mean is, ```"This is just\na simple sentence".match(/This is (.*?) sentence/)``` returned ```null```. ```"This is just\na simple sentence".match(/This is (.*?) sentence/s)``` returned a helpful result. The difference is the DOTALL ```s``` after the final slash. – Marcus May 27 '22 at 21:38

score 17 · Answer 5 · edited Jan 01 '13 at 18:46

17

use this: (?<=beginningstringname)(.*\n?)(?=endstringname)

edited Jan 01 '13 at 18:46

fthiella

48,073
15
90
106

answered Jan 01 '13 at 18:29

vignesh

203
2
2

Don't know why all the up votes, this allows for 0-1 line breaks, and the line break must be immediately before `endstringname` – OGHaza Nov 22 '13 at 11:46
I found it useful to remove the beginning of log lines (timestamp etc). I used new line for the beginning string and "at" for the end string. – Stan Jan 18 '17 at 05:19

score 4 · Answer 6 · answered Jan 14 '19 at 07:27

4

This worked for me (I'm using VS Code):

for: This is just\na simple sentence

Use: This .+ sentence

answered Jan 14 '19 at 07:27

Roshna Omer

687
1
11
20

score 3 · Answer 7 · answered Sep 22 '16 at 14:51

3

You can simply use this: \This is .*? \sentence

answered Sep 22 '16 at 14:51

AnirbanDebnath

990
16
26

`# – buncis Apr 23 '22 at 14:31

score 3 · Answer 8 · answered Apr 24 '19 at 18:56

RegEx to match everything between two strings using the Java approach.

List<String> results = new ArrayList<>(); //For storing results
String example = "Code will save the world";

Let's use Pattern and Matcher objects to use RegEx (.?)*.

Pattern p = Pattern.compile("Code "(.*?)" world");   //java.util.regex.Pattern;
Matcher m = p.matcher(example);                      //java.util.regex.Matcher;

Since Matcher might contain more than one match, we need to loop over the results and store it.

while(m.find()){   //Loop through all matches
   results.add(m.group()); //Get value and store in collection.
}

This example will contain only "will save the" word, but in the bigger text it will probably find more matches.

score 3 · Answer 9 · answered Jun 08 '22 at 10:04

In case of JavaScript you can use [^] to match any character including newlines.

Using the /s flag with a dot . to match any character also works, but is applied to the whole pattern and JavaScript does not support inline modifiers to turn on/off the flag.

To match as least as possible characters, you can make the quantifier non greedy by appending a question mark, and use a capture group to extract the part in between.

This is([^]*?)sentence

See a regex101 demo.

As a side note, to not match partial words you can use word boundaries like \bThis and sentence\b

const s = "This is just\na simple sentence";
const regex = /This is([^]*?)sentence/;
const m = s.match(regex);

if (m) {
  console.log(m[1]);
}

The lookaround variant in JavaScript is (?<=This is)[^]*?(?=sentence) and you could check Lookbehind in JS regular expressions for the support.

Also see Important Notes About Lookbehind.

const s = "This is just\na simple sentence";
const regex = /(?<=This is)[^]*?(?=sentence)/;
const m = s.match(regex);

if (m) {
  console.log(m[0]);
}

score 2 · Answer 10 · answered Jan 05 '18 at 20:46

In case anyone is looking for an example of this within a Jenkins context. It parses the build.log and if it finds a match it fails the build with the match.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

node{    
    stage("parse"){
        def file = readFile 'build.log'

        def regex = ~"(?s)(firstStringToUse(.*)secondStringToUse)"
        Matcher match = regex.matcher(file)
        match.find() {
            capturedText = match.group(1)
            error(capturedText)
        }
    }
}

score 2 · Answer 11 · answered Aug 19 '20 at 05:40

There is a way to deal with repeated instances of this split in a block of text? FOr instance: "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence. ". to matches each instance instead of the entire string, use below code:

data = "This is just\na simple sentence. Here is some additional stuff. This is just\na simple sentence. And here is some more stuff. This is just\na simple sentence."

pattern = re.compile('This is (?s).*? sentence')

for match_instance in re.finditer(pattern, data):
    do_something(match_instance.group())

What if I want to get text between two consecutive This is just\na simple sentence. Patterns? — Nayananga Muhandiram, Sep 30 '21 at 18:41

score 1 · Answer 12 · answered Dec 24 '18 at 21:58

I landed here on my search for regex to convert this print syntax between print "string", in Python2 in old scripts with: print("string"), for Python3. Works well, otherwise use 2to3.py for additional conversions. Here is my solution for others:

Try it out on Regexr.com (doesn't work in NP++ for some reason):

find:     (?<=print)( ')(.*)(')
replace: ('$2')

for variables:

(?<=print)( )(.*)(\n)
('$2')\n

for label and variable:

(?<=print)( ')(.*)(',)(.*)(\n)
('$2',$4)\n

How to replace all print "string" in Python2 with print("string") for Python3?

score 0 · Answer 13 · edited May 31 '18 at 20:42

0

Here is how I did it:
This was easier for me than trying to figure out the specific regex necessary.

int indexPictureData = result.IndexOf("-PictureData:");
int indexIdentity = result.IndexOf("-Identity:");
string returnValue = result.Remove(indexPictureData + 13);
returnValue = returnValue + " [bytecoderemoved] " + result.Remove(0, indexIdentity); `

edited May 31 '18 at 20:42

SherylHohman

16,580
17
88
94

answered May 31 '18 at 19:57

Bbb

517
6
27

score 0 · Answer 14 · answered Jul 04 '18 at 10:23

0

for a quick search in VIM, you could use at Vim Control prompt: /This is.*\_.*sentence

answered Jul 04 '18 at 10:23

vins

59
3

score 0 · Answer 15 · answered Aug 23 '22 at 11:53

i had this string

      headers:
        Date:
          schema:
            type: string
            example: Tue, 23 Aug 2022 11:36:23 GMT
        Content-Type:
          schema:
            type: string
            example: application/json; charset=utf-8
        Transfer-Encoding:
          schema:
            type: string
            example: chunked
        Connection:
          schema:
            type: string
            example: keep-alive
        Content-Encoding:
          schema:
            type: string
            example: gzip
        Vary:
          schema:
            type: string
            example: Accept-Encoding
        Server:
          schema:
            type: number
            example: Microsoft-IIS/10.0
        X-Powered-By:
          schema:
            type: string
            example: ASP.NET
        Access-Control-Allow-Origin:
          schema:
            type: string
            example: '*'
        Access-Control-Allow-Credentials:
          schema:
            type: boolean
            example: 'true'
        Access-Control-Allow-Headers:
          schema:
            type: string
            example: '*'
        Access-Control-Max-Age:
          schema:
            type: string
            example: '-1'
        Access-Control-Allow-Methods:
          schema:
            type: string
            example: GET, PUT, POST, DELETE
        X-Content-Type-Options:
          schema:
            type: string
            example: nosniff
        X-XSS-Protection:
          schema:
            type: string
            example: 1; mode=block
      content:
        application/json:

and i wanted to remove everything from the words headers: to content so I wrote this regex (headers:)[^]*?(content)

and it worked as expected finding how many times that expression has occurred.

score 0 · Answer 16 · answered Jul 28 '23 at 19:13

For python

def match_between_strings(text, start_str, end_str):
    pattern = re.escape(start_str) + r'(.*?)' + re.escape(end_str)
    matches = re.findall(pattern, text, re.DOTALL)
    return matches

Example usage:

start_str = "This"
end_str = "sentence"
text = "This is just\na simple sentence"

result = match_between_strings(text, start_str, end_str)

Result

[' is just\na simple ']

score -1 · Answer 17 · edited Jun 20 '20 at 09:12

-1

Sublime Text 3x

In sublime text, you simply write the two word you are interested in keeping for example in your case it is

"This is" and "sentence"

and you write .* in between

i.e. This is .* sentence

and this should do you well

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 13 '18 at 10:47

rsc05

3,626
2
36
57

Not sure the question is about how to do this in Sublime Text but mostly works in Sublime Text. It does not work when there happens to be a linebreak between "This is" and "sentence". Also, sublime text also selects "This is" and "Sentence" rather than only the text _between_ those two strings. – Dylan Kinnett Nov 16 '18 at 20:52

Regex Match all characters between two strings

17 Answers17

Lazy Quantifier Needed

Example usage:

Result

Sublime Text 3x

Linked

Related