5

I want to write a simple regex, in vim, that will find all strings lexicographically smaller than another string.

Specifically, I want to use this to compare dates formatted as 2014-02-17. These dates are lexicographically sortable, which is why I use them.

My specific use case: I'm trying to run through a script and find all the dates that are earlier than today's today.

I'm also OK with comparing these as numbers, or any other solution.

dee-see
  • 23,668
  • 5
  • 58
  • 91
Edan Maor
  • 9,772
  • 17
  • 62
  • 92
  • inside vim I don't know... but in awk... 'BEGIN {a=0}; /2014-02-17/ {a=1}; a ~ /1/ && /[0-9]{2}-[0-9]{2}/ {print}' if printing was sorted by the date it will print all lines before the occurrence of that date... using the same logic you can split of '-' and just print greater numbers than 20140217, sorry the english I'm learning – BrenoZan Feb 17 '14 at 20:09
  • What is your underlying OS that your VIM is on? – John S. Feb 24 '14 at 17:58
  • My knee jerk reaction is to use [conditional expressions](http://www.regular-expressions.info/conditional.html) in conjunction with [`strftime`](http://www.zinkwazi.com/unix/notes/vimtips.html), but sadly vim [does not support conditional expressions](http://stackoverflow.com/questions/2901240/conditional-regex-in-vim). – bishop Feb 24 '14 at 19:46
  • 1
    This question is unclear. What is the input to the problem, and what is the expected output? Okay, find dates earlier or later than a given date; then do what? Display where they are? Or just have it as a Vim search so they are highlighted and you can step through the matches with `n`? – Kaz Feb 24 '14 at 20:13
  • @Kaz the eventual purpose was to populate the Quickfix window, but also stepping through matches with n is a side benefit. I ran into this problem and thought it was interesting all on its own, though. – Edan Maor Feb 24 '14 at 22:18
  • I think I didn't understand the requirement well. You have a log file in buffer, you want to find lines containing (or starting?) with that date pattern, and date given date? – Kent Feb 25 '14 at 11:14
  • @Kent The two first things you said are what I want: to be able to navigate, with 'n', the buffer by going date by date (one by one). Also, to get an overview of all the places where those dates appear, by putting them in the Quickfix window. Both of these things are easy to do once I have a regex for finding dates earlier than today's date. – Edan Maor Feb 25 '14 at 14:51

4 Answers4

3

You do not say how you want to use this; are you sure that you really want a regular expression? Perhaps you could get away with

if DateCmp(date, '2014-02-24') < 0
  " ...
endif

In that case, try this function.

" Compare formatted date strings:
" @param String date1, date2
"   dates in YYYY-MM-DD format, e.g. '2014-02-24'
" @return Integer
"   negative, zero, or positive according to date1 < date2, date1 == date2, or
"   date1 > date2
function! DateCmp(date1, date2)
  let [year1, month1, day1] = split(a:date1, '-')
  let [year2, month2, day2] = split(a:date2, '-')
  if year1 != year2
    return year1 - year2
  elseif month1 != month2
    return month1 - month2
  else
    return day1 - day2
  endif
endfun

If you really want a regular expression, then try this:

" Construct a pattern that matches a formatted date string if and only if the
" date is less than the input date.  Usage:
" :echo '2014-02-24' =~ DateLessRE('2014-03-12')
function! DateLessRE(date)
  let init = ''
  let branches = []
  for c in split(a:date, '\zs')
    if c =~ '[1-9]'
      call add(branches, init . '[0-' . (c-1) . ']')
    endif
    let init .= c
  endfor
  return '\d\d\d\d-\d\d-\d\d\&\%(' . join(branches, '\|') . '\)'
endfun

Does that count as a "simple" regex? One way to use it would be to type :g/ and then CRTL-R and = and then DateLessRE('2014-02-24') and Enter, followed by the rest of your command. In other words,

:g/<C-R>=DateLessRE('2014-02-24')<CR>/s/foo/bar

EDIT: I added a concat (:help /\&) that matches a complete "formatted date string". Now, there is no need to anchor the pattern.

benjifisher
  • 5,054
  • 16
  • 18
  • I don't think your regex matches `2013-03-24`. It also matches `1` which seems wrong. (So I don't think the regex works) – FDinoff Feb 24 '14 at 19:19
  • The `return` were indented oddly because they were tab indented. Edited to spaces. – bishop Feb 24 '14 at 19:41
  • @FDinoff: If I test using the usage note in the comments before the function, `echo '2013-03-24' =~ '^' . DateLessRE('2014-03-12')` returns 1 and `echo '2014-03-12' =~ '^' . DateLessRE('2013-03-24')` returns 0 as expected. Yes, '1' matches; the pattern only works for "formatted date strings" of the form YYYY-MM-DD. – benjifisher Feb 24 '14 at 20:53
  • @benjifisher Sorry, Yeah I missed that. – FDinoff Feb 25 '14 at 00:30
  • I like this. Ive never found a need or ever remember that concat is a thing. I might also put it at the end instead of the beginning so that the highlighting highlights the whole date instead of just a part of the beginning. – FDinoff Feb 25 '14 at 19:19
3

I don't think there is anyway to do this easily in regex. For matching any date earlier than the current date you can use run the function below (Some of the stuff was stolen from benjifisher)

function! Convert_to_char_class(cur) 
    if a:cur =~ '[2-9]'
        return '[0-' . (a:cur-1) . ']'
    endif
    return '0'
endfunction

function! Match_number_before(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[1-9]'
            call add(branches, init . Convert_to_char_class(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_before(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_before(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_before(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_before(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

To use you the following to search for all matches before 2014-02-24.

/<C-r>=Match_date_before('2014-02-24')

You might be able to wrap it in a function to set the search register if you wanted to.

The generated regex for dates before 2014-02-24 is the following.

\%(\%([0-1]\d\d\d\|200\d\|201[0-3]\)-\d\{2}-\d\{2}\|2014-\%(0[0-1]\)-\d\{2}\|2014-02-\%([0-1]\d\|2[0-3]\)\)

It does not do any validation of dates. It assumes if you are in that format you are a date.


Equivalent set of functions for matching after the passed in date.

function! Convert_to_char_class_after(cur) 
    if a:cur =~ '[0-7]'
        return '[' . (a:cur+1) . '-9]'
    endif
    return '9'
endfunction

function! Match_number_after(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[0-8]'
            call add(branches, init . Convert_to_char_class_after(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_after(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_after(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_after(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_after(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

The regex generated was

\%(\%([3-9]\d\d\d\|2[1-9]\d\d\|20[2-9]\d\|201[5-9]\)-\d\{2}-\d\{2}\|2014-\%([1-9]\d\|0[3-9]\)-\d\{2}\|2014-02-\%([3-9]\d\|2[5-9]\)\)
FDinoff
  • 30,689
  • 5
  • 75
  • 96
  • This seems like the right direction, but quick testing shows it failing for some dates... I tried for example to search on 2014-02-25, and it *didn't* match: 2014-01-17, although it should have. – Edan Maor Feb 25 '14 at 08:05
  • @EdanMaor It should match 2014-01-17 now. I forgot the dash in that case. – FDinoff Feb 25 '14 at 15:10
1

Use nested subpatterns. It starts simple, with the century:

[01]\d\d\d-\d\d-\d\d|20

As for each digit to follow, use one of the following patterns; you may want to replace .* by an appropriate sequence of \d and -.

for 0:   (0
for 1:   (0.*|1
for 2:   ([01].*|2
for 3:   ([0-2].*|3
for 4:   ([0-3].*|4
for 5:   ([0-4].*|5
for 6:   ([0-5].*|6
for 7:   ([0-6].*|7
for 8:   ([0-7].*|8
for 9:   ([0-8].*|9

For the last digit, you only need the digit range, e.g.:

[0-6]

Finally, all parentheses should be closed:

)))))

In the example of 2014-02-17, this becomes:

[01]\d\d\d-\d\d-\d\d|20
(0\d-\d\d-\d\d|1
([0-3]-\d\d-\d\d|4
-
(0
([01]-\d\d|2
-
(0\d|1
[0-6]
)))))

Now in one line:

[01]\d\d\d-\d\d-\d\d|20(0\d-\d\d-\d\d|1([0-3]-\d\d-\d\d|4-(0([01]-\d\d|2-(0\d|1[0-6])))))

For VIM, let's not forget to escape (, ) and |:

[01]\d\d\d-\d\d-\d\d\|20\(0\d-\d\d-\d\d\|1\([0-3]-\d\d-\d\d\|4-\(0\([01]-\d\d\|2-\(0\d\|1[0-6]\)\)\)\)\)

Would be best to try and generate this (much like in FDinoff's answer), rather than write it yourself...

Update: Here is a sample AWK script to generate the correct regex for any date yyyy-mm-dd.

#!/usr/bin/awk -f

BEGIN {                 # possible overrides for non-VIM users
    switch (digit) {
        case "ascii"     : digit = "[0-9]";     break;
        case "posix"     : digit = "[:digit:]"; break;
        default          : digit = "\\d";
    }
    switch (metachar) {
        case "unescaped" : escape = "";         break;
        default          : escape = "\\";
    }
}

/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$/ {
    print BuildRegex($0);
}

function BuildRegex(s) {
    if (s ~ /^[1-9][^1-9]*$/) {
        regex = LessThanOnFirstDigit(s);
    }
    else {
        regex = substr(s, 1, 1) BuildRegex(substr(s, 2));    # recursive call
        if (s ~ /^[1-9]/) {
            regex = escape "(" LessThanOnFirstDigit(s) escape "|" regex escape ")";
        }
    }
    return regex;
}

function LessThanOnFirstDigit(s) {
    first = substr(s, 1, 1) - 1;
    rest = substr(s, 2);
    gsub(/[0-9]/, digit, rest);
    return (first ? "[0-" first "]" : "0") rest;
}

Call it like this:

echo 2014-02-17 | awk -f genregex.awk

Of course, you can write such a simple generator in any language you like. Would be nice to do it in Vimscript, but I have no experience with that, so I will leave that as a home assignment.

Ruud Helderman
  • 10,563
  • 1
  • 26
  • 45
0

If you wanted to search for all dates that were less than 2014-11-23, inclusive, you would use the following regex.

2014-(?:[1-9]|1[0-1])-(?:[1-9]|1[0-9]|2[0-3])

for a better explanation of the regex visit regex101.com and paste the regex in. You can also test it by using that site.

The basics of the regex are to search all dates that:

start with 2014-
either contain a single character from 1 - 9 
    or a 1 and a single character from 0 - 1, i.e. numbers from 1 - 11
finished by - and numbers from 1 - 23 done in the same style as the second term
Rick
  • 563
  • 3
  • 6
  • 24
  • 2
    Granted I'm still half-asleep, but this regex doesn't appear to match Halloween, 2014-10-31. – bishop Feb 24 '14 at 09:20
  • 1
    I do not recognize the `(?:)` construction as part of vim's definition of a regular expression. – benjifisher Feb 24 '14 at 18:12
  • @bishop you are correct, this won't match dates that are above a the specified date, i.e. if I match Nov 20, it won't match Oct 21 - 31 The (?:) allows you to specify a kind of or clause, (?:Hello|Good bye) would match either the string literal 'Hello' or the string literal 'Good bye' – Rick Feb 24 '14 at 18:16
  • @Rick: Maybe in some other regex engine that will work, but in vim `:echo 'Hello' =~ '(?:Hello|Good bye)'` returns 0. In vim, I would use `\(Hello\|Good bye\)` or `\v(Hello|Good bye)`. – benjifisher Feb 24 '14 at 23:25