How to specify more spaces for the delimiter using cut?

Question

Is there any way to specify a field delimiter for more spaces with the cut command? (like " "+) ? For example: In the following string, I like to reach value '3744', what field delimiter I should say?

$ps axu | grep jboss

jboss     2574  0.0  0.0   3744  1092 ?        S    Aug17   0:00 /bin/sh /usr/java/jboss/bin/run.sh -c example.com -b 0.0.0.0

cut -d' ' is not what I want, for it's only for one single space. awk is not what I am looking for either, but how to do with 'cut'?

thanks.

best answer is using `tr` as shown here: http://stackoverflow.com/a/4483833/168143 — John Bachir, Jan 18 '13 at 11:05
Not directly relevant to the actual question being asked but instead of `ps`+`grep` you could use `pgrep` which is available in most modern distros. It will return the result exactly in the form you need it. — ccpizza, Apr 08 '13 at 14:03
Possible duplicate of [How to make the 'cut' command treat multiple characters as one delimiter?](https://stackoverflow.com/questions/4143252/how-to-make-the-cut-command-treat-multiple-characters-as-one-delimiter) — , Apr 16 '18 at 04:06
These days I just use `hck` as a drop in `cut` replacement. By default it splits on all whitespace, like awk. And the key feature is that you can specify a delimiter with `-d` like cut, but unlike cut that delimiter can be a regex! No more needing to pre-process with `tr -s` before passing to cut. You can find `hck` here: https://github.com/sstadick/hck — Chris, Jan 19 '23 at 23:14
Does this answer your question? [Does CUT support multiple spaces as the delimiter?](https://stackoverflow.com/questions/21322968/does-cut-support-multiple-spaces-as-the-delimiter) — dsimic, Aug 22 '23 at 02:33

paxdiablo · Accepted Answer · 2015-07-16T12:21:39.097

355

Actually awk is exactly the tool you should be looking into:

ps axu | grep '[j]boss' | awk '{print $5}'

or you can ditch the grep altogether since awk knows about regular expressions:

ps axu | awk '/[j]boss/ {print $5}'

But if, for some bizarre reason, you really can't use awk, there are other simpler things you can do, like collapse all whitespace to a single space first:

ps axu | grep '[j]boss' | sed 's/\s\s*/ /g' | cut -d' ' -f5

That grep trick, by the way, is a neat way to only get the jboss processes and not the grep jboss one (ditto for the awk variant as well).

The grep process will have a literal grep [j]boss in its process command so will not be caught by the grep itself, which is looking for the character class [j] followed by boss.

This is a nifty way to avoid the | grep xyz | grep -v grep paradigm that some people use.

edited Jul 16 '15 at 12:21

answered Aug 22 '11 at 03:00

paxdiablo

854,327
234
1,573
1,953

2

Great answer. I'll be coming back to look this up again next time I need it. – funroll Mar 19 '13 at 14:55
The `grep` trick seems to not work in crontab files. Any reason? – Amir Ali Akbari Dec 12 '14 at 16:03
3

I keep learning and forgetting the grep trick. Thanks for my most recent reminder. Maybe this time it'll stick. But I wouldn't bet on it. – Michael Burr Jan 12 '17 at 22:30
@Michael, you should set up a cron job somewhere to mail that tip (and possibly others) to you once a month :-) – paxdiablo Jan 13 '17 at 02:31
For that last sed command, you should be able to do `\s+` for "one or more spaces", in place of `\s\s*` which says "space followed by zero or more spaces" – Eric Oct 23 '18 at 21:33
This is great answer but the OP asked how to do it with cut, so I think https://stackoverflow.com/a/29685565/869951 deserves more credit than it currently has. – Oliver Jan 16 '19 at 16:57
4

Oliver, sometimes the best answer to "how do I do X with Y?" is "Don't use Y, use Z instead". Since OP accepted this answer, it's likely I convinced them of that :-) – paxdiablo Jan 16 '19 at 19:32

fedorqui · Answer 2 · 2017-02-15T11:39:06.300

133

awk version is probably the best way to go, but you can also use cut if you firstly squeeze the repeats with tr:

ps axu | grep jbos[s] | tr -s ' ' | cut -d' ' -f5
#        ^^^^^^^^^^^^   ^^^^^^^^^   ^^^^^^^^^^^^^
#              |            |             |
#              |            |       get 5th field
#              |            |
#              |        squeeze spaces
#              |
#        avoid grep itself to appear in the list

edited Feb 15 '17 at 11:39

answered Jan 31 '14 at 09:40

fedorqui

275,237
103
548
598

12

Fancy illustration. – Haggra Oct 12 '18 at 06:42
2

`tr -s ' '` is mighty nice! I hope I can remember that better than `awk` – Chris Oct 04 '19 at 13:09
@Chris I have to object :D Awk is way better for these things!! – fedorqui Oct 04 '19 at 13:10
@fedorqui When it comes to print nth field to the end, the [`cut -f5-` grammar, "-fN-"](https://www.computerhope.com/unix/ucut.htm#Specifying-LIST) is much simpler than [`awk`](https://stackoverflow.com/questions/1602035/how-to-print-third-column-to-last-column). – Weekend Jun 23 '22 at 12:56
@Weekend agreed. – fedorqui Jun 23 '22 at 13:25

score 46 · Answer 3 · answered Apr 16 '15 at 20:42

46

I like to use the tr -s command for this

 ps aux | tr -s [:blank:] | cut -d' ' -f3

This squeezes all white spaces down to 1 space. This way telling cut to use a space as a delimiter is honored as expected.

answered Apr 16 '15 at 20:42

RobertDeRose

649
8
5

2

I think this should be the answer, it is closer to the OP request (asked to use cut). This approach is 5-10% slower than the awk approach (because there is one more pipe to handle with tr), but in general this will be irrelevant. – Oliver Jan 16 '19 at 16:55

score 12 · Answer 4 · edited Aug 11 '15 at 21:56

I am going to nominate tr -s [:blank:] as the best answer.

Why do we want to use cut? It has the magic command that says "we want the third field and every field after it, omitting the first two fields"

cat log | tr -s [:blank:] |cut -d' ' -f 3-

I do not believe there is an equivalent command for awk or perl split where we do not know how many fields there will be, ie out put the 3rd field through field X.

arielf · Answer 5 · 2019-01-12T08:06:27.667

9

Shorter/simpler solution: use `cuts` (cut on steroids I wrote)

ps axu | grep '[j]boss' | cuts 4

Note that cuts field indexes are zero-based so 5th field is specified as 4

http://arielf.github.io/cuts/

And even shorter (not using cut at all) is:

pgrep jboss

edited Jan 12 '19 at 08:06

answered Jul 03 '14 at 02:35

arielf

5,802
1
36
48

score 8 · Answer 6 · answered Aug 22 '11 at 03:01

8

One way around this is to go:

$ps axu | grep jboss | sed 's/\s\+/ /g' | cut -d' ' -f3

to replace multiple consecutive spaces with a single one.

answered Aug 22 '11 at 03:01

Jared Ng

4,891
2
19
18

Strange, this does not work on OS X. The sed command does not change multiple spaces to one space. – rjurney Feb 25 '16 at 23:53
2

`\s` is a GNU sed extension. On OS X you can pass the `-E` flag to sed to enable extended regular expressions, then use `[[:space:]]` in place of `\s`, like so: `sed -E 's/[[:space:]]+/ /g'` – Jared Ng Feb 26 '16 at 13:00

score 5 · Answer 7 · answered Aug 22 '11 at 03:00

5

Personally, I tend to use awk for jobs like this. For example:

ps axu| grep jboss | grep -v grep | awk '{print $5}'

answered Aug 22 '11 at 03:00

paulsm4

114,292
17
138
190

6

That can be compressed down to `ps axu | awk '/[j]boss/ {print $5}'`. – zwol Aug 22 '11 at 03:23
1

Isn't awk slower (especially when there are some superfluous other processes), then sed / grep / cut? – pihentagy Sep 28 '12 at 15:47

score 2 · Answer 8 · answered Feb 26 '16 at 07:09

2

As an alternative, there is always perl:

ps aux | perl -lane 'print $F[3]'

Or, if you want to get all fields starting at field #3 (as stated in one of the answers above):

ps aux | perl -lane 'print @F[3 .. scalar @F]'

answered Feb 26 '16 at 07:09

flitz

23
4

This does not work with the output of `lsof` I tried `lsof|perl -lane 'print $F[5]'` this sometimes gets the 5th column, sometimes the 6th – rubo77 Dec 31 '18 at 13:41
I think the question just was how to use delimiters that might contain a varying number of spaces. For this purpose the answer was correct. – flitz Jan 01 '19 at 21:06
In lsof the problem is that the number of columns is not always consistent in each row. – flitz Jan 01 '19 at 21:12
You can use this answer: [Get a certain column of an output with content aligned right and some columns not always filled](https://unix.stackexchange.com/a/491770/20661) – rubo77 Jan 03 '19 at 17:38

Mike · Answer 9 · 2018-09-12T15:58:59.717

If you want to pick columns from a ps output, any reason to not use -o?

e.g.

ps ax -o pid,vsz
ps ax -o pid,cmd

Minimum column width allocated, no padding, only single space field separator.

ps ax --no-headers -o pid:1,vsz:1,cmd

3443 24600 -bash
8419 0 [xfsalloc]
8420 0 [xfs_mru_cache]
8602 489316 /usr/sbin/apache2 -k start
12821 497240 /usr/sbin/apache2 -k start
12824 497132 /usr/sbin/apache2 -k start

Pid and vsz given 10 char width, 1 space field separator.

ps ax --no-headers -o pid:10,vsz:10,cmd

  3443      24600 -bash
  8419          0 [xfsalloc]
  8420          0 [xfs_mru_cache]
  8602     489316 /usr/sbin/apache2 -k start
 12821     497240 /usr/sbin/apache2 -k start
 12824     497132 /usr/sbin/apache2 -k start

Used in a script:-

oldpid=12824
echo "PID: ${oldpid}"
echo "Command: $(ps -ho cmd ${oldpid})"

score 0 · Answer 10 · answered Feb 03 '14 at 06:11

0

Another way if you must use cut command

ps axu | grep [j]boss |awk '$1=$1'|cut -d' ' -f5

In Solaris, replace awk with nawk or /usr/xpg4/bin/awk

answered Feb 03 '14 at 06:11

BMW

42,880
12
99
116

score 0 · Answer 11 · answered Aug 26 '15 at 03:41

0

I still like the way Perl handles fields with white space.
First field is $F[0].

$ ps axu | grep dbus | perl -lane 'print $F[4]'

answered Aug 26 '15 at 03:41

AAAfarmclub

2,202
1
19
13

score 0 · Answer 12 · answered Feb 07 '18 at 15:18

My approach is to store the PID to a file in /tmp, and to find the right process using the -S option for ssh. That might be a misuse but works for me.

#!/bin/bash

TARGET_REDIS=${1:-redis.someserver.com}
PROXY="proxy.somewhere.com"

LOCAL_PORT=${2:-6379}

if [ "$1" == "stop" ] ; then
    kill `cat /tmp/sshTunel${LOCAL_PORT}-pid`
    exit
fi

set -x

ssh -f -i ~/.ssh/aws.pem centos@$PROXY -L $LOCAL_PORT:$TARGET_REDIS:6379 -N -S /tmp/sshTunel$LOCAL_PORT  ## AWS DocService dev, DNS alias
# SSH_PID=$! ## Only works with &
SSH_PID=`ps aux | grep sshTunel${LOCAL_PORT} | grep -v grep | awk '{print $2}'`
echo $SSH_PID > /tmp/sshTunel${LOCAL_PORT}-pid

Better approach might be to query for the SSH_PID right before killing it, since the file might be stale and it would kill a wrong process.

How to specify more spaces for the delimiter using cut?

12 Answers12

Shorter/simpler solution: use `cuts` (cut on steroids I wrote)

Linked

How to specify more spaces for the delimiter using cut?

12 Answers12

Shorter/simpler solution: use cuts (cut on steroids I wrote)

Linked

Shorter/simpler solution: use `cuts` (cut on steroids I wrote)