3

After trying 10 times to rewrite this question to be accepted , i have a small text that have text between brackets, i want to extract that text so i wrote this expression :

/(\([^\)]+\))/i

but this only extracts text between first ( and last ) ignoring the rest of text so is there any way to extract full text like :

i want(to) extract this text

from :

this is the text that (i want(to) extract this text) from

there might be more than one bracket enclosed sub-text .

Thanks

EDIT Found this :

preg_match_all("/\((([^()]*|(?R))*)\)/", $rejoin, $matches);

very usefull from the link provided in the accepted answer

Rami Dabain
  • 4,709
  • 12
  • 62
  • 106
  • Check here http://php.net/manual/en/regexp.reference.recursive.php – elclanrs Jul 03 '13 at 03:29
  • Is your general requirement (a) to extract everything from the outermost parentheses, or (b) to extract the second lowest level of bracketed expressions, or (c) something else? In the general case, you cannot handle arbitrary levels of nesting in regex, but if you have a fixed number, you can probably create a regex for it. – tripleee Jul 03 '13 at 03:30
  • While recursive "reg"ex can do this, it's probably better and more maintainable to actually write a lightweight parser for this use case. Recursion in "reg"ex is extremely resource-intensive and performs badly, and even in the best formats the expression itself can be pretty opaque. – eyelidlessness Jul 03 '13 at 04:26
  • /\((.*)\)/ tahts from here http://stackoverflow.com/questions/19836706/regex-get-all-content-between-two-characters – Mrigesh Raj Shrestha Mar 03 '14 at 10:57

4 Answers4

6

Yes you can use this pattern

   v                   v
 (\([^\)\(]*)+([^\)\(]*\))+
 ------------ -------------
      |            |
      |            |->match all (right)brackets to the right..
      |
      |->match all (left)brackets to the left

Demo


Above pattern won't work if you have a recursive pattern like this

(i want(to) (extract and also (this)) this text)
                              ------
            -------------------------

In this case you can use the recursive pattern as recommended by elclanrs


You can also do it without without using regex by maintaining a count of number of ( and )

So, assume noOfLB is the count of ( and noOfRB is the count of )

  • keep on iterating each character in string and maintain the position of first (
  • increament noOfLB if you find (
  • increment noOfRB if you find )
  • if noOfLB==noOfRB,you have found the last position of last )

I don't know php so I would implement above algo in c#

public static string getFirstRecursivePattern(string input)
{
    int firstB=input.IndexOf("("),noOfLB=0,noOfRB=0;
    for(int i=firstB;i<input.Length && i>=0;i++)
    {
         if(input[i]=='(')noOfLB++;
         if(input[i]==')')noOfRB++;
         if(noOfLB==noOfRB)return input.Substring(firstB,i-firstB+1);
    }
    return "";
}
Anirudha
  • 32,393
  • 7
  • 68
  • 89
2

You will need recursive subpatterns to solve this. Here is the regex that should work for you:

$str = 'this is the text that (i want(to) extract this text) from';
if (preg_match('/\s* \( ( (?: [^()]* | (?0) )+ ) \) /x', $str, $arr))
   var_dump($arr);

OUTPUT:

string(28) "i want(to) extract this text"
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • there might me other brackets : ''this is the text that (i want(to) (to) (to) (to) extract this text) from'' – Rami Dabain Jul 03 '13 at 08:00
  • Out of curiosity what text you want to extract from the input: `this is the text that (i want(to) (to) (to) (to) extract this text) from` ? – anubhava Jul 03 '13 at 12:09
  • 1
    My solution above will provide: `i want(to) (to) (to) (to) extract this text` – anubhava Jul 03 '13 at 12:09
  • yes but it's not recursive . I figured I need a recursive one and the answer provided a link toa good one . I will post it tomorrow by editing the question – Rami Dabain Jul 03 '13 at 16:11
  • Looking forward to see your edit. However I like to repeat that above answer is indeed recursive and based on this: http://php.net/manual/en/regexp.reference.recursive.php – anubhava Jul 03 '13 at 16:28
0

You can also use substrings:

$yourString = "this is the text that (i want(to) extract this text) from";

$stringAfterFirstParen = substr( strstr( $yourString, "(" ), 1 );

$indexOfLastParen = strrpos( $stringAfterFirstParen, ")" );

$stringBetweenParens = substr( $stringAfterFirstParen, 0, $indexOfLastParen );
go-oleg
  • 19,272
  • 3
  • 43
  • 44
0

I think I understand the question and that is that you would like to extract "i want(to) extract this text" or something similar from something that might appear like this: this is the text that (i want(to) extract this text) from

If so, you might find success with the following regular expression (using $text to define the variable being examined and $txt as the variable being created in the case of a match which is then stored in the array $t[]):

if (preg_match('/\(\w+.+\)/', $text, $t)) {
$txt = $t[0];
} else {
$txt = "";
}
echo $desired=substr($txt,1,-1);

The RegEx at the root of this is: (\w+.+) and here is the explanation of the code:

  1. Match the character “(” literally «(»
  2. Match a single character that is a “word character” (letters, digits, and underscores) «\w+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
  3. Match any single character that is not a line break character «.+» Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
  4. Match the character “)” literally «)»
  5. Put the text that is within the parentheses into a new variable $desired. Display the $desired characters by selecting a substring that is reduced by one character on either end, thereby eliminating the bounding parentheses.«echo $desired=substr($txt,1-1)»

Using the above I was able to display: i want(to) extract this text from the variable $text = this is the text that (i want(to) extract this text) from. If desire to pull the "to" from the (to) I would suggest that you run the variable through the regex loop until there are no more ( )'s found in the expression and it returns a null value and concatenate the returned values to form the variable of interest.

Best of luck, Steve

Steve Kinzey
  • 373
  • 2
  • 9