1

I have a string that represents a date like so:

    "May 5 2014"

I'd like to know how to extract the "5" from it.

What I've Tried so far:

   echo "May 5 2014" | sed 's/[^0-9]*\s//'

That returns "5 2014"

sorry for the remedial questions. just new to bash.

dot
  • 14,928
  • 41
  • 110
  • 218
  • possible duplicate of [bash: shortest way to get n-th column of output](http://stackoverflow.com/questions/7315587/bash-shortest-way-to-get-n-th-column-of-output) – jman Apr 03 '14 at 18:38

6 Answers6

5

Use cut:

echo "May 5 2014" | cut -d' ' -f2

or awk:

echo "May 5 2014" | awk '{print $2}'

In case you want to it without external utilities, it'd be a two step process:

s="May 5 2014"
t="${s#* }"
echo "${t% *}"
devnull
  • 118,548
  • 33
  • 236
  • 227
  • devnull, my date string is stored in a variable called $a. when I do "newvar = $(echo $a|cut -d' ' -f2)" i get an error that says newvar not found – dot Apr 03 '14 at 18:41
  • @dot Eliminate spaces around `=`. – devnull Apr 03 '14 at 18:42
4

If you're writing a script that needs to parse date strings, you can surely do it using sed et al, and indeed there are already several answers here that do the trick nicely.

However, my advice would be to let the date program do the heavy lifting for you:

$ date -d "May 5 2014" +%-d
5

The maintainers of the date program have no doubt spent many hours and days getting their date-parsing code right. Why not leverage that work instead of rolling your own?

EDIT

Added BSD solution e.g. for (Mac OS X)

date -j -f '%b %d %Y' 'May 5 2014' '+%d'

on BSD need tell to the date in what format is the "incoming" date with -f format and will output it in the format +format. The -j mean, do not set the date.

clt60
  • 62,119
  • 17
  • 107
  • 194
Mike Holt
  • 4,452
  • 1
  • 17
  • 24
  • nice one! unfortunately for BSD systems (OS X) not works (needs another syntax) – clt60 Apr 03 '14 at 19:15
  • Note that although `date` doesn't accept any and all date formats you can think of (e.g., it complains about "May 5th 2014"), it still is much more flexible than assuming a single format. For example, `date` *will* accept dates such as "5/5/2014", "May 5", "2014-05-05", "2014-5-5", and others. – Mike Holt Apr 03 '14 at 19:55
  • Doing anything other than this is a bit bizarre for date parsing (and a duplicate of so many other questions). No mention of BSD in the question. +1 – Reinstate Monica Please Apr 04 '14 at 02:09
  • @BroSlow not mention Linux too... You can't assume than everybody uses GNU `date`, here are many Mac users too. Anyway, I agree with the answer - the date parsing with date is nice - but need care about the different OS syntax. – clt60 Apr 04 '14 at 21:08
  • @jm666 Nothing against bsd (though I dislike bsd variants of some tools like find, stat, etc...), gnu is just more prevalent, and questions where OP is asking about bsd tend to get tagged with something like osx, solaris, bsd, etc... But obviously nice to provide multiple solutions as you have. – Reinstate Monica Please Apr 04 '14 at 21:31
4

Bash's builtin read command can split input into multiple variables. The '<<<' tells read to take input from the following string.

read first second remainder <<< "May 5 2014"

After which, "$first" will be "May", "$second" will be "5" and "$remainder" will be "2014"

It is common practice to use '' as a placeholder for uninteresting fields as the shell automatically overwrites $.

read _ day _ <<< 'May 5 2014 utc'
dave sines
  • 41
  • 1
  • This is really neat :) +1 – clt60 Apr 03 '14 at 19:21
  • @dave sines Not only neat, but `read month day year <<< "May 5 2014"` is so much incredibly faster. I did some test and found it to be over 20 times faster than `day=$(echo "May 5 2014" | cut -d' ' -f2)`. If one were to do the same for month day and year, it is over 60 times faster. Thank you! – Keith Reynolds Apr 03 '14 at 20:20
  • @KeithReynolds the @devnull's pure bash solution is 5 times faster and my pure-bash-regex solution is 3 times faster than this `read` solution. so, it is neat - but not the fastest :) – clt60 Apr 03 '14 at 21:48
  • @jm666 I also found that your pure-bash-regex solution is 3 times faster than this read solution if your only looking for the day. On the other hand `read month day year <<< "May 5 2014"` is about the same speed as `re="(.*) (.*) (.*)"; [[ $aaa =~ $re ]]; month=${BASH_REMATCH[1]}; day=${BASH_REMATCH[2]};year=${BASH_REMATCH[3]}` – Keith Reynolds Apr 03 '14 at 22:29
  • @KeithReynolds in my system 100000 times, read solution: 27sec, regex with 3x assign 10sec, regex 1x assign 8 sec, and devnulls solution 5.4 sec. ;) anyway, it is really not very important - all pure bash solutions are good. ;) :) – clt60 Apr 03 '14 at 22:36
  • @jm666 Did you include `aaa="May 5 2014";{[ $aaa =~ $re ]]` inside your loop test along with 3x assignments, that is one for month one for date and one for year? A fair comparison would because one could change the date each iteration if one wanted too. – Keith Reynolds Apr 03 '14 at 23:10
  • @KeithReynolds see my answer for real numbers and the "benchmark" source code... try it yourself. :) – clt60 Apr 04 '14 at 20:44
3

with sed, one possibility is:

echo "May 5 2014" | sed 's/.* \([0-9]*\) .*/\1/'

another one

echo "May 5 2014" | sed 's/[^ ]* //;s/ [^ ]*//'

another

echo "May 5 2014" | sed 's/\(.*\) \(.*\) \(.*\)/\2/'

with grep

echo "May 5 2014" | grep -oP '\b\d{1,2}\b'

or perl

echo "May 5 2014" | perl -lanE 'say $F[1]'

as curiosity

echo "May 5 2014" | xargs -n1 | head -2 | tail -1
echo "May 5 2014" | xargs -n1 | sed -n 2p
echo "May 5 2014" | xargs -n1 | egrep '^[0-9]{1,2}$'

and finally, pure bash solution, without starting any external commands

aaa="May 5 2014"
[[ $aaa =~ (.*)[[:space:]](.*)[[:space:]](.*) ]] && echo ${BASH_REMATCH[2]}

or

aaa="May 5 2014"
re="(.*) (.*) (.*)"
[[ $aaa =~ $re ]] && echo ${BASH_REMATCH[2]}

EDIT

Because Keith Reynolds asking for some benchmarks, i tested the following script. Using time is not the perfect benchmarking tool, but gives some insight.

  • each test outputs N-times the result (what is counted by wc)
  • NOTE, the external commands are executed only 10_000 times while the pure bash solutions 100_000 times

Here is the script:

xbench_with_read() {
    let i=$1; while ((i--)); do
        read _ day _ <<< 'May 5 2014'
        echo $day
    done
}

xbench_regex_3x_assign() {
    let i=$1; while ((i--)); do
        aaa="May 5 2014"
        re="(.*) (.*) (.*)"
        [[ $aaa =~ $re ]] && month="${BASH_REMATCH[1]}" && day="${BASH_REMATCH[2]}" && year="${BASH_REMATCH[3]}" && echo "$day"
    done
}

xbench_regex_1x_assign() {
    let i=$1; while ((i--)); do
        aaa="May 5 2014"
        re="(.*) (.*) (.*)"
        [[ $aaa =~ $re ]] && day=${BASH_REMATCH[2]} && echo "$day"
    done
}

xbench_var_expansion() {
    let i=$1; while ((i--)); do
        s="May 5 2014"
        t="${s#* }"
        echo "${t% *}"
    done
}

xbench_ext_cut() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | cut -d' ' -f2
    done
}

xbench_ext_grep() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | grep -oP '\b\d{1,2}\b'
    done
}

xbench_ext_sed() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | sed 's/\(.*\) \(.*\) \(.*\)/\2/'
    done
}

xbench_ext_xargs() {
    let i=$1; while ((i--)); do
        echo "May 5 2014" | xargs -n1 | sed -n 2p
    done
}

title() {
    echo '~ -'$___{1..20} '~' >&2
    echo "Timing $1 $2 times" >&2
}

for script in $(compgen -A function | grep xbench)
do
    cnt=100000
    #external programs run 10x less times
    [[ $script =~ _ext_ ]] && cnt=$(( $cnt / 10 ))
    title $script $cnt
    time $script $cnt | wc -l
done

and here are the raw results:

~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_ext_cut 10000 times
   10000

real    0m37.752s
user    0m14.587s
sys 0m25.723s
~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_ext_grep 10000 times
   10000

real    1m35.570s
user    0m21.778s
sys 0m34.524s
~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_ext_sed 10000 times
   10000

real    0m41.628s
user    0m15.310s
sys 0m26.422s
~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_ext_xargs 10000 times
   10000

real    1m42.235s
user    0m46.601s
sys 1m11.238s
~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_regex_1x_assign 100000 times
  100000

real    0m11.215s
user    0m8.784s
sys 0m0.907s
~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_regex_3x_assign 100000 times
  100000

real    0m14.669s
user    0m12.419s
sys 0m1.027s
~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_var_expansion 100000 times
  100000

real    0m5.148s
user    0m4.658s
sys 0m0.788s
~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~ - ~
Timing xbench_with_read 100000 times
  100000

real    0m27.700s
user    0m6.279s
sys 0m19.724s

So sorted by real execution time

pure bash solutions 100_000 times

  1. xbench_var_expansion - real 0m5.148s - 5.2 sec
  2. xbench_regex_1x_assign - real 0m11.215s - 11.2 sec
  3. xbench_regex_3x_assign - real 0m14.669s - 14.7 sec
  4. xbench_with_read - real 0m27.700s - 27.7 sec

No surprises here - the variable expansion is simply the fastest solution.

external programs only 10_000 times

  1. xbench_ext_cut - real 0m37.752s - 37.8 sec
  2. xbench_ext_sed - real 0m41.628s - 41.6 sec
  3. xbench_ext_grep - real 1m35.570s - 95.6 sec
  4. xbench_ext_xargs - real 1m42.235s - 102.2 sec

Two surprises here (at least for me):

  • the grep solution is 2x slover as sed
  • the xargs (curiosity solution) only slightly slower as grep

Enviromnent:

$ uname -a
Darwin marvin.local 13.1.0 Darwin Kernel Version 13.1.0: Thu Jan 16 19:40:37 PST 2014; root:xnu-2422.90.20~2/RELEASE_X86_64 x86_64

$ LC_ALL=C bash --version
GNU bash, version 4.2.45(2)-release (i386-apple-darwin13.0.0)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
clt60
  • 62,119
  • 17
  • 107
  • 194
  • I like the variety here. I was actually just writing your first possibility myself but was forgetting the extra space before the [0-9]* – SS781 Apr 03 '14 at 18:49
  • @SS781 - the best method is with `cut` already answered by devnull – clt60 Apr 03 '14 at 18:51
  • 1
    the `cut` is small - so fast start and short typing in a script ;). but in the reality the best is a pure bash solution not showed it yet by nobody, because a pure bash doesn't start any external programs... – clt60 Apr 03 '14 at 18:54
0

With awk :

echo "May 5 2014" | awk '{print $2}'
jrjc
  • 21,103
  • 9
  • 64
  • 78
0

You could use bash substring expansion and apply an offset (:4) and a length (:1) value. Just adjust the offset and the lenght values in cases where the format of the string changes.

Here is an example:

$ date_format="May 5 2014"
$ echo "${date_format:4:1}"
5

$ date_format="2014 May 5"
$ echo "${date_format: -1:1}"    # <- Watch that space before the negative value
5

$ date_format="5 May 2014"
$ echo "${date_format:0:1}"
5
Saucier
  • 4,200
  • 1
  • 25
  • 46