1

I am trying to create a regex that will match the begining of my string in few alternatives.

  1. If the whole string contains no / then match ^[a-zA-Z\-]
  2. If the string contains / then match ^[\w] until the first occurence of /

examples:

__Gi0__/0/0/0
__BVI__10

The match needs to be returned so I want to wrap it in ()

I have tried these

([a-zA-Z]+)|([\-\w]{2,}/)

but it doesnt match second case.

Any suggestions?

My awk version is GNU Awk 4.0.0

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
Dharman
  • 30,962
  • 25
  • 85
  • 135
  • you'll need to include more info about your `awk` version. If you running on AIX or older Unix we'll need to know that too. Good luck. (Best to edit your question with this info, as you may know). – shellter May 02 '14 at 09:21
  • A dew hints about the regex you used: It's not anchored, so the first group will match everything before a slash and the second group will never be used. Also note that inside a character range `[]` you dont have to escape special characters (besides `]`). – SebastianH May 02 '14 at 09:45
  • What do you mean by `The match needs to be returned`? There are no awk functions or language constructs that return a string matching an RE (e.g. replace some text in a string or populate an array). There are functions that do other things with a matched RE so if you provide more info we can help you come up with the best solution. Also, the issue with providing solutions to these types of questions is never matching what you want, it's NOT matching what you DON'T want so post more interesting input and expected output or you might get a buggy "solution". – Ed Morton May 02 '14 at 13:00
  • @Ed Morton I'm using gawk and the match function in gawk is able to populate an array. http://stackoverflow.com/a/4673336/1839439 . sshashank124 understood my problem and provided the answer I needed so I don't think I need to provide any more input. – Dharman May 02 '14 at 13:16

1 Answers1

2

You can simply do it as:

^((\w+)\/|([a-zA-Z_-]+))

Second captured group contains what you want.

Or as @Jenny suggested, you can make the first group non-capturing as follows:

^(?:(\w+)\/|([a-zA-Z_-]+))

DEMO

sshashank124
  • 31,495
  • 9
  • 67
  • 76
  • But if there is no `/` in the string then the numbers should not be matched. – Dharman May 02 '14 at 09:19
  • Note that spaces do not exist in the strings – Dharman May 02 '14 at 09:20
  • Your answer works but the `/` is also included in my capture group which I don't want. – Dharman May 02 '14 at 09:30
  • @Dharman, Please note as I stated in my answer. The __2nd__ captured group \2 or $2 will contain the string you want. – sshashank124 May 02 '14 at 09:31
  • But when I use the second group I lose the ones without `/` eg. **BVI** – Dharman May 02 '14 at 09:35
  • Ok nvm I joined the results of the two groups together it works now – Dharman May 02 '14 at 09:37
  • @Dharman. Regex in my answer updated. Sorry for the trouble. I'm sure it works now – sshashank124 May 02 '14 at 09:37
  • You may add ?: after the first ( to disable capturing of the first group. – Krisztián Balla May 02 '14 at 09:38
  • 1
    This question was tagged for `awk` but the accepted solution won't work in most awks since `\w` is gawk-specific. The second one with `?:` intended to mean something won't work in any awk at all. Why not just use POSIX character classes `^(([[:alnum:]_]+)\/|([[:alpha:]_-]+))`? Also `The match needs to be returned so I want to wrap it in ()` only has meaning within a call to gensub() and gawk match() so hopefully that's where you're using the RE. – Ed Morton May 02 '14 at 12:55