4

I don't know regular expression at all. Can anybody help me with one very simple regular expression which is,

extracting 'word:word' from a sentence. e.g "Java Tutorial Format:Pdf With Location:Tokyo Javascript"?

  • Little modification: the first 'word' is from a list but second is anything. "word1 in [ABC, FGR, HTY]"
  • guys situation demands a little more modification. The matching form can be "word11:word12 word13 .. " till the next "word21: ... " .

things are becoming complex with sec.....i have to learn reg ex :(

thanks in advance.

Brad Mace
  • 27,194
  • 17
  • 102
  • 148
Subrat
  • 881
  • 1
  • 11
  • 19

7 Answers7

8

You can use the regex:

\w+:\w+

Explanation:
\w - single char which is either a letter(uppercase or lowercase), digit or a _.
\w+ - one or more of above char..basically a word

so \w+:\w+ would match a pair of words separated by a colon.

codaddict
  • 445,704
  • 82
  • 492
  • 529
2

Try \b(\S+?):(\S+?)\b. Group 1 will capture "Format" and group 2, "Pdf".

A working example:

<html>
<head>
<script type="text/javascript">
function test() {
    var re = /\b(\S+?):(\S+?)\b/g; // without 'g' matches only the first
    var text = "Java Tutorial Format:Pdf With Location:Tokyo  Javascript";

    var match = null;
    while ( (match = re.exec(text)) != null) {
        alert(match[1] + " -- " + match[2]);
    }

}
</script>
</head>
<body onload="test();">

</body>
</html>

A good reference for regexes is https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp

mcrisc
  • 809
  • 1
  • 9
  • 19
1

Use this snippet :

 
$str=" this is pavun:kumar hello world bk:systesm" ;
if ( preg_match_all  ( '/(\w+\:\w+)/',$str ,$val ) )
 {
 print_r ( $val ) ;
 }
 else
 {
 print "Not matched \n";
 }
Pavunkumar
  • 5,147
  • 14
  • 43
  • 69
1

Continuing Jaú's function with your additional requirement:

function test() {
    var words = ['Format', 'Location', 'Size'],
            text = "Java Tutorial Format:Pdf With Location:Tokyo Language:Javascript", 
            match = null;
    var re = new RegExp( '(' + words.join('|') + '):(\\w+)', 'g');
    while ( (match = re.exec(text)) != null) {
        alert(match[1] + " = " + match[2]);
    }
}
instanceof me
  • 38,520
  • 3
  • 31
  • 40
0

I am currently solving that problem in my nodejs app and found that this is, what I guess, suitable for colon-paired wordings:

([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))

It also matches quoted value. like a:"b" c:'d e' f:g

Example coding in es6:

const regex = /([\w]+:)("(([^"])*)"|'(([^'])*)'|(([^\s])*))/g;
const str = `category:"live casino" gsp:S1aik-UBnl aa:"b" c:'d e' f:g`;
let m;

while ((m = regex.exec(str)) !== null) {
   // This is necessary to avoid infinite loops with zero-width matches
   if (m.index === regex.lastIndex) {
      regex.lastIndex++;
   }

   // The result can be accessed through the `m`-variable.
   m.forEach((match, groupIndex) => {
      console.log(`Found match, group ${groupIndex}: ${match}`);
   });
}

Example coding in PHP

$re = '/([\w]+:)("(([^"])*)"|\'(([^\'])*)\'|(([^\s])*))/';
$str = 'category:"live casino" gsp:S1aik-UBnl aa:"b" c:\'d e\' f:g';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

You can check/test your regex expressions using this online tool: https://regex101.com

Btw, if not deleted by regex101.com, you can browse that example coding here

lukaserat
  • 4,768
  • 1
  • 25
  • 36
-1

here's the non regex way, in your favourite language, split on white spaces, go through the element, check for ":" , print them if found. Eg Python

>>> s="Java Tutorial Format:Pdf With Location:Tokyo Javascript"
>>> for i in s.split():
...     if ":" in i:
...         print i
...
Format:Pdf
Location:Tokyo

You can do further checks to make sure its really "someword:someword" by splitting again on ":" and checking if there are 2 elements in the splitted list. eg

>>> for i in s.split():
...     if ":" in i:
...         a=i.split(":")
...         if len(a) == 2:
...             print i
...
Format:Pdf
Location:Tokyo
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
-2
([^:]+):(.+)

Meaning: (everything except : one or more times), :, (any character one ore more time)

You'll find good manuals on the net... Maybe it's time for you to learn...

Macmade
  • 52,708
  • 13
  • 106
  • 123