1

Consider the following command line: tfile -a -fn P2324_234.w07 -tc 8811

The regex to parse this: -\w+|\w+\s|\w+\.+\w+\s (see screenshot below)

The problem is when the file name has multiple dots, say: tfile -a -fn P23.24.23.4.w07 -tc 8811

Question: how to ensure the P23.24.23.4.w07 is parsed as one argument (as in P23.24.23.4.w07)?

enter image description here

dda
  • 6,030
  • 2
  • 25
  • 34
adhg
  • 10,437
  • 12
  • 58
  • 94
  • Have a read [here](http://stackoverflow.com/questions/17043454/using-regexes-how-to-efficiently-match-strings-between-double-quotes-with-embed). It essentially boils down to the same problem. Also, you might want to use an online tester, that supports the flavor you are actually using: http://regexplanet.com – Martin Ender Jun 12 '13 at 01:24
  • Which part of command line you want to get ? – cat916 Jun 12 '13 at 01:29

5 Answers5

3

Describe it!

For: P23.24.23.4.w07
use: \w+(?:\.\w+)+

note that for your java version you can use possessive quantifiers and atomic groups:

\\w++(?>\\.\\w++)+
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • But that's an evil regex. See http://stackoverflow.com/questions/12841970/how-can-i-recognize-an-evil-regex. – AJMansfield Jun 12 '13 at 01:25
  • 4
    @AJMansfield no it's not. There is no overlap between `\w` and `.` which makes it pretty safe. This is called "unrolling-the-loop" and is one of *the* efficiency techniques. See the link in my comment on the question. – Martin Ender Jun 12 '13 at 01:27
  • @AJMansfield `(a+)+` can be source of catastrophic backtracking, but (ab+)+ is not that dangerous. +1 m.buettner. – Pshemo Jun 12 '13 at 01:31
3

Use a character class, e.g., /-fn [a-z0-9.]+ -tc/i. In English, that means "-fn, followed by one or more of characters between a-z, between 0-9, or a ., followed by -tc." If you want to capture that part, wrap that part in parentheses.

thomasd
  • 2,593
  • 1
  • 22
  • 23
1

I have used this -\w+|\w+\s|\S+.+\w+\s

Instead of 'word', we may use 'not space', You have not specified your extra requirement so I think it is fine.

Regexpal testing

simpletron
  • 739
  • 4
  • 14
0

Use a quantifier:

-\w+|\w+\s|(?:\w+\.+)+\w+\s
           ^^^      ^^

You can also simply your expression to:

-?\w+\s?|(?:\w+\.+)+\w+\s
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
0

For doing this in java, all you need to do is split it along the spaces, no regex needed. The good ole String.split() should be able to handle it.

AJMansfield
  • 4,039
  • 3
  • 29
  • 50