2

I want to use regular expression to filter substrings from this string eg: hello world #level:basic #lang:java:php #...

I am trying to produce an array with a structure like this:

Array 
(
    [0]=> hello world
    [1]=> Array 
          (
              [0]=> level
              [1]=> basic
          )
    [2]=> Array 
          (
              [0]=> lang
              [1]=> java
              [2]=> php
          )
)

I have tried preg_match("/(.*)#(.*)[:(.*)]*/", $input_line, $output_array);

and what I have got is:

Array
(
    [0] => hello world #level:basic #lang:java:php
    [1] => hello world #level:basic 
    [2] => lang:java:php
)

In this case then I will have to apply this regex few times to the indexes and then apply a regex to filter the colon out. My question is: is it possible to create a better regex to do all in one go? what would the regex be? Thanks

Duc
  • 511
  • 1
  • 4
  • 18
  • use `explode` instead – Dave May 09 '13 at 23:08
  • The syntax is unclear. After encountering a hash, when is the match supposed to end? At the next hash? If so, you cannot put any "normal" text after the last hash because it would be considered part of it. You need to tighten the syntax up. – Jon May 09 '13 at 23:14

6 Answers6

2

do this

$array = array() ;
$text = "hello world #level:basic #lang:java:php";

$array = explode("#", $text);
foreach($array as $i => $value){
    $array[$i] = explode(":", trim($value));
}

print_r($array);
Waqleh
  • 9,741
  • 8
  • 65
  • 103
2

You can use :

$array = explode("#", "hello world #level:basic #lang:java:php");
foreach($array as $k => &$v) {
    $v = strpos($v, ":") === false ? $v : explode(":", $v);
}
print_r($array);
Baba
  • 94,024
  • 28
  • 166
  • 217
  • thanks, your answer is really neat, short and return an array just as I wanted. Also I like the passing by ref part. Just a quick question, if I want the "hello world" part to be flexible, can be anywhere, is that possible? – Duc May 10 '13 at 10:32
2

Got something for you:

Rules:

  • a tag begins with #
  • a tag may not contain whitespace/newline
  • a tag is preceeded and followed by whitespace or line beginning/ending
  • a tag can have several parts divided by :

Example:

#this:tag:matches this is some text #a-tag this is no tag: \#escaped
and this one tag#does:not:match

Function:

<?php
function parseTags($string)
{
    static $tag_regex = '@(?<=\s|^)#([^\:\s]+)(?:\:([^\s]+))*(?=\s|$)@m';

    $results = array();
    preg_match_all($tag_regex, $string, $results, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);

    $tags = array();
    foreach($results as $result) {
        $tag = array(
            'offset' => $result[0][1],
            'raw' => $result[0][0],
            'length' => strlen($result[0][0]),
            0 => $result[1][0]);
        if(isset($result[2]))
            $tag = array_merge($tag, explode(':', $result[2][0]));

        $tag['elements'] = count($tag)-3;
        $tags[] = $tag;
    }

    return $tags;
}
?>

Result:

array(2) {
  [0]=>array(7) {
    ["offset"]=>int(0)
    ["raw"]=>string(17) "#this:tag:matches"
    ["length"]=>int(17)
    [0]=>string(4) "this"
    [1]=>string(3) "tag"
    [2]=>string(7) "matches"
    ["elements"]=>int(3)
  }
  [1]=>array(5) {
    ["offset"]=>int(36)
    ["raw"]=>string(6) "#a-tag"
    ["length"]=>int(6)
    [0]=>string(5) "a-tag"
    ["elements"]=>int(1)
  }
}

Each matched tag contains

  • the raw tag text
  • the tag offset and original length (e.g. to replace it in the string later with str... functions)
  • the number of elements (to safely iterate for($i = 0; $i < $tag['elements']; $i++))
Lukas
  • 1,479
  • 8
  • 20
1

This might work for you:

$results = array() ;
$text = "hello world #level:basic #lang:java:php" ;

$parts = explode("#", $text);
foreach($parts as $part){
    $results[] = explode(":", $part);
}

var_dump($results);
sybear
  • 7,837
  • 1
  • 22
  • 38
1

Two ways using regex, note that you somehow need explode() since PCRE for PHP doesn't support capturing a subgroup:

$string = 'hello world #level:basic #lang:java:php';
preg_match_all('/(?<=#)[\w:]+/', $string, $m);
foreach($m[0] as $v){
    $example1[] = explode(':', $v);
}
print_r($example1);


// This one needs PHP 5.3+
$example2 = array();
preg_replace_callback('/(?<=#)[\w:]+/', function($m)use(&$example2){
    $example2[] = explode(':', $m[0]);
}, $string);
print_r($example2);
Community
  • 1
  • 1
HamZa
  • 14,671
  • 11
  • 54
  • 75
0

This give you the array structure you are looking for:

<pre><?php
$subject = 'hello world #level:basic #lang:java:php';
$array = explode('#', $subject);
foreach($array as &$value) {
    $items = explode(':', trim($value));
    if (sizeof($items)>1) $value = $items;
}
print_r($array);

But if you prefer you can use this abomination:

$subject = 'hello world #level:basic #lang:java:php';
$pattern = '~(?:^| ?+#)|(?:\G([^#:]+?)(?=:| #|$)|:)+~';
preg_match_all($pattern, $subject, $matches);

array_shift($matches[1]);
$lastKey = sizeof($matches[1])-1;

foreach ($matches[1] as $key=>$match) {
    if (!empty($match)) $temp[]=$match;        
    if (empty($match) || $key==$lastKey) {
        $result[] = (sizeof($temp)>1) ? $temp : $temp[0];
        unset($temp);
    }
}

print_r($result);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125