0

Been struggling with this one...

ABC 123 tab 123 tab 534

$DEF564 (Hello World) Something Arbit-rary here

I want to get matched text of:

ABC 123

DEF564

This pretty much explains what text I want matched [-A-Z0-9_ ]+ (my examples don't cover all possibilities)

The problem is that it is start of line ^ text

so ^[-A-Z0-9_ ]+ will match the first example but fail of course on the 2nd.

How do I write a regex ignoring the $ in the first position? That is the only possibility in my source text either there is a $ or no $ in the first character but I don't want this $ to be part of my matched text result string.

I'm sure this is not difficult for someone who knows regex well but my regex ability is rather limited, so that's why I've been spending a lot of time on this and getting nowhere.

So the answer I need is

^IgnoreDollarSignInFirstCharacterIfPresent[-A-Z0-9_ ]+

ycomp
  • 8,316
  • 19
  • 57
  • 95
  • 1
    What is your regex engine or language ? – Gilles Quénot Dec 06 '14 at 14:31
  • Perl 5.18 in Delphi but I don't want to modify my source code, I'd like to use pure regex – ycomp Dec 06 '14 at 14:34
  • 1
    There's no such thing as "pure regex", just like there's no "pure English" ;-) – Álvaro González Dec 06 '14 at 14:52
  • what I meant is use only regex, not modify the Delphi source code. Of course it would have been much quicker modifying the source code, but I figured that there must be something I really need to learn here about regex. – ycomp Dec 06 '14 at 14:56
  • And what I meant is that regexp is not a single standard language but a series of dialects. It's probably not relevant for a simple match like this though. – Álvaro González Dec 06 '14 at 15:01
  • oops turns out actually I don't use Perl 5.18, but instead PCRE 7.9 (Perl 5.18 doesn't work with the accepted answer but PCRE 7.9 does) – ycomp Dec 06 '14 at 15:32
  • 1
    @ÁlvaroG.Vicario: It is, the accepted answer use a lookbehind that doesn't exist in many flavors. – Casimir et Hippolyte Dec 06 '14 at 16:53

4 Answers4

2

If you want the whole match to be your target use a look behind:

(?<=^\$|^)[-A-Z0-9 ]+

See demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • great, I pasted all my example text from my real application in there and they all work. (In your online demo) – ycomp Dec 06 '14 at 15:00
  • The only problem is that I get this message "Perl 5.18 requires all alternatives inside lookbehind to have the same length" on my computer - I mean the Rubular page works fine but I'm using a Perl 5.18 engine in my code. – ycomp Dec 06 '14 at 15:02
  • oh, seems I don't have Perl 5.18 engine in my code. I have PCRE 7.9 (not sure what Perl engine that equates too). So this does work for me in my code. Thanks. – ycomp Dec 06 '14 at 15:25
  • small correction, the _ is missing. Here is the revised version: (?<=^\$|^)[-A-Z0-9 _]+ – ycomp Dec 06 '14 at 15:48
0

test in perl :

$ echo 'ABC 123 tab 123 tab 534
$DEF564 tab 456 tab 5454' | perl -lne '/^\$?\K(?:[-A-Z\d_\s])+/ and print $&' 
ABC 123 
DEF564

So the regex is :

^\$?\K(?:[-A-Z\d_\s])+
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • you need to escape the `$` sign. – Avinash Raj Dec 06 '14 at 14:27
  • this is why I am stuck, these regexes still match the $ (I want to ignore it if it is present... I mean if first char is $ then I want text starting at 2nd position, else I want the text starting from the first)... but I don't really know how to do conditional regex – ycomp Dec 06 '14 at 14:30
  • Post the language..@ycomp – Avinash Raj Dec 06 '14 at 14:32
  • i think op wants `ABC 123` – Avinash Raj Dec 06 '14 at 14:40
  • it's a look-around trick, see http://stackoverflow.com/questions/13542950/support-of-k-in-regex – Gilles Quénot Dec 06 '14 at 14:40
  • ah, yeah it is good but doesn't catch something like ABC 123 .. another example is AQQQQQ_Q 12-15 of text I need to match. Anything in this [-A-Z0-9_ ]+ . This is really messy, I'll try to edit my question a bit. – ycomp Dec 06 '14 at 14:40
  • new edit also doesn't work for me, as there are other characters on the line that I need to ignore... basically stop at the tabs in the examples of the question – ycomp Dec 06 '14 at 14:55
  • @ycomp then you could use `^\$?([^\t]+)` – Avinash Raj Dec 06 '14 at 14:58
0

You could use the below regex and get the string you want from group index 1.

^\$?([-A-Z0-9_ ]+)

$ is a special meta charcater in regex which represents the end of the line anchor. So you need to escape it to match a literal $ symbol. And also the ? after the $ symbol makes the previous $ sign as optional.

DEMO

$ echo 'ABC 123 tab 123 tab 534
$DEF564 tab 456 tab 5454' | perl -lne '/^\$?([-A-Z0-9_ ]+)/ and print $1'
ABC 123 
DEF564 

OR

simply, you could use,

^\$?\K[^\t]+

[^\t]+ matches any character but not of \t one or more times.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

If I've understood your question correctly I believe you want a Regular Expression which doesn't include a dollar sign even if it's present in which case for the two examples you have given the following regular expression will work.

([A-Z]){3}\s?([0-9]){3}

It will match in all places where there are 3 capital letters followed by 3 numbers with an optional white space in-between. If you want to match any number of capital letters followed by any number of numbers replace the quantifier({3}) with +.

Demo: https://regex101.com/r/gW7mT8/1

  • Thanks for your suggestion, but my examples do not cover all possibilites - there are too many. That's why I have the regex [-A-Z0-9_ ]+ to describe all possibilites of the matched text I want (my problem was just how to deal with the ^ and the optional \$) ... but it seems that the answer I needed was the one by Bohemian – ycomp Dec 06 '14 at 15:29