226

Is there any way to specify a field delimiter for more spaces with the cut command? (like " "+) ? For example: In the following string, I like to reach value '3744', what field delimiter I should say?

$ps axu | grep jboss

jboss     2574  0.0  0.0   3744  1092 ?        S    Aug17   0:00 /bin/sh /usr/java/jboss/bin/run.sh -c example.com -b 0.0.0.0

cut -d' ' is not what I want, for it's only for one single space. awk is not what I am looking for either, but how to do with 'cut'?

thanks.

jww
  • 97,681
  • 90
  • 411
  • 885
leslie
  • 11,858
  • 7
  • 23
  • 22
  • 16
    best answer is using `tr` as shown here: http://stackoverflow.com/a/4483833/168143 – John Bachir Jan 18 '13 at 11:05
  • 1
    Not directly relevant to the actual question being asked but instead of `ps`+`grep` you could use `pgrep` which is available in most modern distros. It will return the result exactly in the form you need it. – ccpizza Apr 08 '13 at 14:03
  • 1
    Possible duplicate of [How to make the 'cut' command treat multiple characters as one delimiter?](https://stackoverflow.com/questions/4143252/how-to-make-the-cut-command-treat-multiple-characters-as-one-delimiter) –  Apr 16 '18 at 04:06
  • These days I just use `hck` as a drop in `cut` replacement. By default it splits on all whitespace, like awk. And the key feature is that you can specify a delimiter with `-d` like cut, but unlike cut that delimiter can be a regex! No more needing to pre-process with `tr -s` before passing to cut. You can find `hck` here: https://github.com/sstadick/hck – Chris Jan 19 '23 at 23:14
  • Does this answer your question? [Does CUT support multiple spaces as the delimiter?](https://stackoverflow.com/questions/21322968/does-cut-support-multiple-spaces-as-the-delimiter) – dsimic Aug 22 '23 at 02:33

12 Answers12

355

Actually awk is exactly the tool you should be looking into:

ps axu | grep '[j]boss' | awk '{print $5}'

or you can ditch the grep altogether since awk knows about regular expressions:

ps axu | awk '/[j]boss/ {print $5}'

But if, for some bizarre reason, you really can't use awk, there are other simpler things you can do, like collapse all whitespace to a single space first:

ps axu | grep '[j]boss' | sed 's/\s\s*/ /g' | cut -d' ' -f5

That grep trick, by the way, is a neat way to only get the jboss processes and not the grep jboss one (ditto for the awk variant as well).

The grep process will have a literal grep [j]boss in its process command so will not be caught by the grep itself, which is looking for the character class [j] followed by boss.

This is a nifty way to avoid the | grep xyz | grep -v grep paradigm that some people use.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 2
    Great answer. I'll be coming back to look this up again next time I need it. – funroll Mar 19 '13 at 14:55
  • The `grep` trick seems to not work in crontab files. Any reason? – Amir Ali Akbari Dec 12 '14 at 16:03
  • 3
    I keep learning and forgetting the grep trick. Thanks for my most recent reminder. Maybe this time it'll stick. But I wouldn't bet on it. – Michael Burr Jan 12 '17 at 22:30
  • @Michael, you should set up a cron job somewhere to mail that tip (and possibly others) to you once a month :-) – paxdiablo Jan 13 '17 at 02:31
  • For that last sed command, you should be able to do `\s+` for "one or more spaces", in place of `\s\s*` which says "space followed by zero or more spaces" – Eric Oct 23 '18 at 21:33
  • This is great answer but the OP asked how to do it with cut, so I think https://stackoverflow.com/a/29685565/869951 deserves more credit than it currently has. – Oliver Jan 16 '19 at 16:57
  • 4
    Oliver, sometimes the best answer to "how do I do X with Y?" is "Don't use Y, use Z instead". Since OP accepted this answer, it's likely I convinced them of that :-) – paxdiablo Jan 16 '19 at 19:32
133

awk version is probably the best way to go, but you can also use cut if you firstly squeeze the repeats with tr:

ps axu | grep jbos[s] | tr -s ' ' | cut -d' ' -f5
#        ^^^^^^^^^^^^   ^^^^^^^^^   ^^^^^^^^^^^^^
#              |            |             |
#              |            |       get 5th field
#              |            |
#              |        squeeze spaces
#              |
#        avoid grep itself to appear in the list
fedorqui
  • 275,237
  • 103
  • 548
  • 598
46

I like to use the tr -s command for this

 ps aux | tr -s [:blank:] | cut -d' ' -f3

This squeezes all white spaces down to 1 space. This way telling cut to use a space as a delimiter is honored as expected.

RobertDeRose
  • 649
  • 8
  • 5
  • 2
    I think this should be the answer, it is closer to the OP request (asked to use cut). This approach is 5-10% slower than the awk approach (because there is one more pipe to handle with tr), but in general this will be irrelevant. – Oliver Jan 16 '19 at 16:55
12

I am going to nominate tr -s [:blank:] as the best answer.

Why do we want to use cut? It has the magic command that says "we want the third field and every field after it, omitting the first two fields"

cat log | tr -s [:blank:] |cut -d' ' -f 3- 

I do not believe there is an equivalent command for awk or perl split where we do not know how many fields there will be, ie out put the 3rd field through field X.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Wayne Mehl
  • 121
  • 1
  • 3
9

Shorter/simpler solution: use cuts (cut on steroids I wrote)

ps axu | grep '[j]boss' | cuts 4

Note that cuts field indexes are zero-based so 5th field is specified as 4

http://arielf.github.io/cuts/

And even shorter (not using cut at all) is:

pgrep jboss
arielf
  • 5,802
  • 1
  • 36
  • 48
8

One way around this is to go:

$ps axu | grep jboss | sed 's/\s\+/ /g' | cut -d' ' -f3

to replace multiple consecutive spaces with a single one.

Jared Ng
  • 4,891
  • 2
  • 19
  • 18
  • Strange, this does not work on OS X. The sed command does not change multiple spaces to one space. – rjurney Feb 25 '16 at 23:53
  • 2
    `\s` is a GNU sed extension. On OS X you can pass the `-E` flag to sed to enable extended regular expressions, then use `[[:space:]]` in place of `\s`, like so: `sed -E 's/[[:space:]]+/ /g'` – Jared Ng Feb 26 '16 at 13:00
5

Personally, I tend to use awk for jobs like this. For example:

ps axu| grep jboss | grep -v grep | awk '{print $5}'
paulsm4
  • 114,292
  • 17
  • 138
  • 190
2

As an alternative, there is always perl:

ps aux | perl -lane 'print $F[3]'

Or, if you want to get all fields starting at field #3 (as stated in one of the answers above):

ps aux | perl -lane 'print @F[3 .. scalar @F]'
flitz
  • 23
  • 4
  • This does not work with the output of `lsof` I tried `lsof|perl -lane 'print $F[5]'` this sometimes gets the 5th column, sometimes the 6th – rubo77 Dec 31 '18 at 13:41
  • I think the question just was how to use delimiters that might contain a varying number of spaces. For this purpose the answer was correct. – flitz Jan 01 '19 at 21:06
  • In lsof the problem is that the number of columns is not always consistent in each row. – flitz Jan 01 '19 at 21:12
  • You can use this answer: [Get a certain column of an output with content aligned right and some columns not always filled](https://unix.stackexchange.com/a/491770/20661) – rubo77 Jan 03 '19 at 17:38
2

If you want to pick columns from a ps output, any reason to not use -o?

e.g.

ps ax -o pid,vsz
ps ax -o pid,cmd

Minimum column width allocated, no padding, only single space field separator.

ps ax --no-headers -o pid:1,vsz:1,cmd

3443 24600 -bash
8419 0 [xfsalloc]
8420 0 [xfs_mru_cache]
8602 489316 /usr/sbin/apache2 -k start
12821 497240 /usr/sbin/apache2 -k start
12824 497132 /usr/sbin/apache2 -k start

Pid and vsz given 10 char width, 1 space field separator.

ps ax --no-headers -o pid:10,vsz:10,cmd

  3443      24600 -bash
  8419          0 [xfsalloc]
  8420          0 [xfs_mru_cache]
  8602     489316 /usr/sbin/apache2 -k start
 12821     497240 /usr/sbin/apache2 -k start
 12824     497132 /usr/sbin/apache2 -k start

Used in a script:-

oldpid=12824
echo "PID: ${oldpid}"
echo "Command: $(ps -ho cmd ${oldpid})"
Mike
  • 21
  • 3
0

Another way if you must use cut command

ps axu | grep [j]boss |awk '$1=$1'|cut -d' ' -f5

In Solaris, replace awk with nawk or /usr/xpg4/bin/awk

BMW
  • 42,880
  • 12
  • 99
  • 116
0

I still like the way Perl handles fields with white space.
First field is $F[0].

$ ps axu | grep dbus | perl -lane 'print $F[4]'
AAAfarmclub
  • 2,202
  • 1
  • 19
  • 13
0

My approach is to store the PID to a file in /tmp, and to find the right process using the -S option for ssh. That might be a misuse but works for me.

#!/bin/bash

TARGET_REDIS=${1:-redis.someserver.com}
PROXY="proxy.somewhere.com"

LOCAL_PORT=${2:-6379}

if [ "$1" == "stop" ] ; then
    kill `cat /tmp/sshTunel${LOCAL_PORT}-pid`
    exit
fi

set -x

ssh -f -i ~/.ssh/aws.pem centos@$PROXY -L $LOCAL_PORT:$TARGET_REDIS:6379 -N -S /tmp/sshTunel$LOCAL_PORT  ## AWS DocService dev, DNS alias
# SSH_PID=$! ## Only works with &
SSH_PID=`ps aux | grep sshTunel${LOCAL_PORT} | grep -v grep | awk '{print $2}'`
echo $SSH_PID > /tmp/sshTunel${LOCAL_PORT}-pid

Better approach might be to query for the SSH_PID right before killing it, since the file might be stale and it would kill a wrong process.

Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277