1

I have a regex I need to match against a path like so: "C:\Documents and Settings\User\My Documents\ScanSnap\382893.pd~". I need a regex that matches all paths except those ending in '~' or '.dat'. The problem I am having is that I don't understand how to match and negate the exact string '.dat' and only at the end of the path. i.e. I don't want to match {d,a,t} elsewhere in the path.

I have built the regex, but need to not match .dat

[\w\s:\.\\]*[^~]$[^\.dat]

[\w\s:\.\\]* This matches all words, whitespace, the colon, periods, and backspaces. [^~]$[^\.dat]$ This causes matches ending in '~' to fail. It seems that I should be able to follow up with a negated match for '.dat', but the match fails in my regex tester.

I think my answer lies in grouping judging from what I've read, would someone point me in the right direction? I should add, I am using a file watching program that allows regex matching, I have only one line to specify the regex.

This entry seems similar: Regex to match multiple strings

Community
  • 1
  • 1
Radix
  • 15
  • 5

4 Answers4

5

You want to use a negative look-ahead:

^((?!\.dat$)[\w\s:\.\\])*$

By the way, your character group ([\w\s:\.\\]) doesn't allow a tilde (~) in it. Did you intend to allow a tilde in the filename if it wasn't at the end? If so:

^((?!~$|\.dat$)[\w\s:\.\\~])*$
Jeremy Stein
  • 19,171
  • 16
  • 68
  • 83
  • No, I didn't realize that, but I do not with to include a tilde, I wish to exclude both the file suffix ".pd~" and ".dat" which are created as temp files. – Radix Oct 08 '09 at 18:01
  • Then you don't have to worry about the tilde at all. Since tilde can't occur in the filename at all, you don't have to explicitly check that the filename doesn't end in a tilde. You can use the first, simpler regex. – Jeremy Stein Oct 08 '09 at 18:14
  • Okay, I see what you mean now when you include the tilde in the character group. I don't expect a tilde to appear in any of the file strings we will be using, though I will include it just in case. THANK YOU so much! It took a couple tries before I understood how the program I am using works, or I would have replied sooner. Do I understand correctly, that either ~ or \.dat strings are matched against the '$' "match at the end" character? Thus the neg lookahead checks that neither are there before continuing. If so I appreciate the reference, it's better than what google was teaching me. – Radix Oct 08 '09 at 18:16
  • Both proposed solutions will also reject file names containing the character '~', not only ending with them. This was not the OP's intention, AFAIK. Personally I find my suggestion to be clearer (and correct!). :) – Bart Kiers Oct 08 '09 at 18:26
  • @Bart: People generally find their own solutions to be clearer. I tried to start with what he was using and fix it. – Jeremy Stein Oct 08 '09 at 20:42
  • @Jeremy: I wouldn't have said so if the difference was small (IMO, of course). But, looking at the response you posted under my suggestion, you seem to agree with me! :) – Bart Kiers Oct 09 '09 at 13:50
3

The following regex:

^.*(?<!\.dat|~)$

matches any string that does NOT end with a '~' or with '.dat'.

^             # the start of the string
.*            # gobble up the entire string (without line terminators!)
(?<!\.dat|~)  # looking back, there should not be '.dat' or '~'
$             # the end of the string

In plain English: match a string only when looking behind from the end of the string, there is no sub-string '.dat' or '~'.

Edit: the reason why your attempt failed is because a negated character class, [^...] will just negate a single character. A character class always matches a single character. So when you do [^.dat], you're not negating the string ".dat" but you're matching a single character other than '.', 'd', 'a' or 't'.

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Oh, you're right. I've learned more this way though. Can I add an arbitrary number of extensions to ignore in this negative look ahead grouping? – Radix Oct 08 '09 at 18:49
  • Yes, simply OR it. The regex `^.*(?<!\.dat|~|\.txt)$` would now also reject '.txt' files. – Bart Kiers Oct 08 '09 at 19:13
  • Great, that's what I meant to ask. That is, by using '|' (pipe, OR) it would work. Thank you. – Radix Oct 08 '09 at 19:23
  • Hey, that was clever to use a negative look-behind instead of a negative look-ahead. It's much clearer that way. – Jeremy Stein Oct 09 '09 at 13:40
2
^((?!\.dat$)[\w\s:\.\\])*$

This is just a comment on an earlier answer suggestion:

. within a character class, [], is a literal . and does not need escaping.

^((?!\.dat$)[\w\s:.\\])*$

I'm sorry to post this as a new solution, but I apparently don't have enough credibility to simply comment on an answer yet.

genio
  • 874
  • 6
  • 7
  • I don't either, and unfortunately I don't have enough credibility to give you more either. Thanks for making explicit what I had guessed from the other answers. – Radix Oct 08 '09 at 18:40
-2

I believe you are looking for this:

[\w\s:\.\\]*([^~]|[^\.dat])$

which finds, like before, all word chars, white space, periods (.), back slashes. Then matches for either tilde (~) or '.dat' at the end of the string. You may also want to add a caret (^) at the very beginning if you know that the string should be at the beginning of a new line.

^[\w\s:\.\\]*([^~]|[^\.dat])$
sanscore
  • 529
  • 1
  • 4
  • 11
  • this is not what is being asked for, the [^...] looks anything single character that is not in the list – David Oct 08 '09 at 17:44
  • Thanks, that is what I was getting at, but I was wrong. That matches both '.dat' and '~' as correct. I don't understand why yet. – Radix Oct 08 '09 at 18:21