412

I want to use space as a delimiter with the cut command.

What syntax can I use for this?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jaelebi
  • 5,879
  • 8
  • 32
  • 34
  • 53
    untrue, the man page for cut doesn't explain this and is, in general, not informative – UncleZeiv Oct 05 '10 at 16:11
  • 2
    Also, "info cut" is no improvement in this case. – cardiff space man Apr 04 '13 at 00:26
  • @UncleZeiv: The `man` page doesn't explain this, because it has nothing to do with `cut` _specifically_ and everything to do with [how the _shell_ parses string literals](http://wiki.bash-hackers.org/syntax/quoting) and with [how POSIX-compatible utilities parse option-arguments](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html) _in general_. – mklement0 May 02 '15 at 06:39
  • 4
    @mklement0 if I recall, I was replying to a comment that has since been deleted, which was dismissing this question as being answered to in the man page, which was in my opinion "untrue", regardless of there being a good reason for it or not - now, while I concede that there might be a good reason for this lack of information, I still think that documentation without common usage examples is often at least irritating, when not outright useless – UncleZeiv May 05 '15 at 10:00
  • 3
    @UncleZeiv Got it; thanks for clarifying; given the interest in this question, it's fair to assume that the `man` page isn't enough. Let's take a look: "`-d delim` Use `delim` as the field delimiter character instead of the tab character." (BSD `cut`, but the GNU version and the POSIX spec pretty much state the same). Using a _shell_ to invoke `cut` - the typical case - therefore requires you to know how to _generally_ pass a space as an argument using _shell syntax_, which is arguably not the `cut` man page's job. Real-world examples always help, however, and the _GNU_ man page lacks them. – mklement0 May 05 '15 at 20:27
  • 4
    although the [selected answer](http://stackoverflow.com/a/816824/199217) is technically correct, consider selecting the more [recent and comprehensive answer](http://stackoverflow.com/a/29998195/199217) by @mklement0 as the canonical answer so that it filters to the top. – David LeBauer Sep 16 '15 at 18:34

8 Answers8

458
cut -d ' ' -f 2

Where 2 is the field number of the space-delimited field you want.

RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • 6
    can you tell cut to use any number of a certain character as the delimiter, like in RegEx? e.g. any number of spaces, e.g. \s+ – amphibient Nov 01 '12 at 15:42
  • 4
    @foampile No, I don't believe you can. – Jonathan Hartley Nov 05 '12 at 10:51
  • 7
    You can't use regexes with `cut`, but you can with `cuts` which tries to "fix" all of `cut` limitations: https://github.com/arielf/cuts – arielf Jul 03 '14 at 04:00
  • can you get every third space-delimted field? like `cut -d ' ' -f 3,6,9,12,15,18` without having to specify every number? – Monocito Apr 17 '20 at 08:00
  • It's such a common use case to split on variable number of contiguous space it's a bit funny it's not dealt with - but following up on `cuts` sounds like it might be good. – NeilG May 26 '23 at 03:17
  • I'll just point out, because commonly people viewing this answer are really looking for this instead, that `tr -s ' '` can be used in the pipeline to squash all occurrences of multiple spaces into one, making `cut -d ' '` suddenly really great for scenarios where you're getting variable numbers of spaces on the line because of justification. Actually available in this answer: https://stackoverflow.com/a/19069428/134044 – NeilG May 26 '23 at 04:18
245

Usually if you use space as delimiter, you want to treat multiple spaces as one, because you parse the output of a command aligning some columns with spaces. (and the google search for that lead me here)

In this case a single cut command is not sufficient, and you need to use:

tr -s ' ' | cut -d ' ' -f 2

Or

awk '{print $2}'
BeniBela
  • 16,412
  • 4
  • 45
  • 52
  • Yes! This should be the accepted answer, or at least included in the accepted answer. I can't remember ever trying to use cut on space separated data when I didn't have to normalize the spaces. – Jeremy Brooks Jul 31 '20 at 22:50
  • This is money. `tr` translates or deletes characters. The `-s` option replaces repeats with a single occurrence. – young_souvlaki Mar 11 '21 at 16:28
  • i'm on a mac and i couldn't get the `cut` or `tr` examples to work. the `awk` example worked perfectly for what i was doing (splitting output with multiple spaces between columns) – Alf47 Mar 28 '22 at 14:09
58

To complement the existing, helpful answers; tip of the hat to QZ Support for encouraging me to post a separate answer:

Two distinct mechanisms come into play here:

  • (a) whether cut itself requires the delimiter (space, in this case) passed to the -d option to be a separate argument or whether it's acceptable to append it directly to -d.

  • (b) how the shell generally parses arguments before passing them to the command being invoked.

(a) is answered by a quote from the POSIX guidelines for utilities (emphasis mine)

If the SYNOPSIS of a standard utility shows an option with a mandatory option-argument [...] a conforming application shall use separate arguments for that option and its option-argument. However, a conforming implementation shall also permit applications to specify the option and option-argument in the same argument string without intervening characters.

In other words: In this case, because -d's option-argument is mandatory, you can choose whether to specify the delimiter as:

  • (s) EITHER: a separate argument
  • (d) OR: as a value directly attached to -d.

Once you've chosen (s) or (d), it is the shell's string-literal parsing - (b) - that matters:

  • With approach (s), all of the following forms are EQUIVALENT:

    • -d ' '
    • -d " "
    • -d \<space> # <space> used to represent an actual space for technical reasons
  • With approach (d), all of the following forms are EQUIVALENT:

    • -d' '
    • -d" "
    • "-d "
    • '-d '
    • d\<space>

The equivalence is explained by the shell's string-literal processing:

All solutions above result in the exact same string (in each group) by the time cut sees them:

  • (s): cut sees -d, as its own argument, followed by a separate argument that contains a space char - without quotes or \ prefix!.

  • (d): cut sees -d plus a space char - without quotes or \ prefix! - as part of the same argument.

The reason the forms in the respective groups are ultimately identical is twofold, based on how the shell parses string literals:

  • The shell allows literal to be specified as is through a mechanism called quoting, which can take several forms:
    • single-quoted strings: the contents inside '...' is taken literally and forms a single argument
    • double-quoted strings: the contents inside "..." also forms a single argument, but is subject to interpolation (expands variable references such as $var, command substitutions ($(...) or `...`), or arithmetic expansions ($(( ... ))).
    • \-quoting of individual characters: a \ preceding a single character causes that character to be interpreted as a literal.
  • Quoting is complemented by quote removal, which means that once the shell has parsed a command line, it removes the quote characters from the arguments (enclosing '...' or "..." or \ instances) - thus, the command being invoked never sees the quote characters.
Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    For cut from Gow only the option with double quote work: -d" ", -d " ", "-d ". All options with single quote or not work. – Frank Jan 20 '21 at 20:58
  • @Frank, yes, it is the _shell_ that matters with respect to the quoting styles supported, and given that [Gow](https://github.com/bmatzelle/gow) runs on Windows, you either need to use `cmd.exe`'s syntax (`"`-quoting only, `^` as the escape char.) or PowerShell's syntax (`\`` as the escape char.) – mklement0 Jan 20 '21 at 21:35
48

You can also say:

cut -d\  -f 2

Note that there are two spaces after the backslash.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Chas. Owens
  • 64,182
  • 22
  • 135
  • 226
  • 32
    The person who knows that '\' escapes the next character would be very careful to note what came next. Using '\' to escape space characters like this is a very common idiom. – Jonathan Hartley Mar 21 '12 at 09:24
  • 5
    @Jonathan Hartley commonly most of the codes are unreadable indeed :) – Luca Borrione Nov 02 '12 at 13:24
  • 1
    From a linux/unix perspective, `\ ` was my first attempt and it worked. I agree it is less obvious when compared to `' '`, but I'm sure many are glad to read it here as reassurance of behavior. For a better understanding, please see @mklement0's comment below. – tresf May 01 '15 at 22:14
  • @JonathanHartley correction: "the *selfish* person who knows that '\' escapes the next character and *assumes* everybody else knows that also". For personal projects this does not apply, but in a team-setting, that assumption is a very dangerous (and potentially costly) one. – Eduard Nicodei Sep 13 '17 at 12:14
  • 1
    @EduardNicodei Oh I agree. We were talking about readers of the code ("who notices...?"), not authors. But also, on some teams it's fine to assume a certain level of proficiency. Depends on the environment. – Jonathan Hartley Sep 13 '17 at 19:30
  • And the person who knows that `-d` is followed by the delimiter could assume that \ is the delimiter :) – Michel Jul 04 '19 at 10:15
8

I just discovered that you can also use "-d ":

cut "-d "

Test

$ cat a
hello how are you
I am fine
$ cut "-d " -f2 a
how
am
Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • 5
    Note that _from `cut`'s perspective_ all of the following are identical: `"-d "`, `'-d '`, `-d" "`, `-d' '`, and `-d\`: all forms directly append the option argument (a space) to the option (`-d`) and result in the _exact same string_ by the time `cut` sees them: a single argument containing d followed by a space, after the _shell_ has performed [quote removal](http://www.gnu.org/software/bash/manual/html_node/Quote-Removal.html#Quote-Removal) – mklement0 Apr 22 '15 at 13:28
  • 1
    @mklement0's answer should be **the** answer. It is the most comprehensive on this page (even though it is a comment). – tresf May 01 '15 at 22:16
  • @QZSupport: I appreciate the sentiment and the encouragement - it has inspired me to post my own answer with additional background information. – mklement0 May 02 '15 at 03:53
5

You can't do it easily with cut if the data has for example multiple spaces. I have found it useful to normalize input for easier processing. One trick is to use sed for normalization as below.

echo -e "foor\t \t bar" | sed 's:\s\+:\t:g' | cut -f2  #bar
Anssi
  • 2,727
  • 1
  • 20
  • 16
3

scut, a cut-like utility (smarter but slower I made) that can use any perl regex as a breaking token. Breaking on whitespace is the default, but you can also break on multi-char regexes, alternative regexes, etc.

scut -f='6 2 8 7' < input.file  > output.file

so the above command would break columns on whitespace and extract the (0-based) cols 6 2 8 7 in that order.

maazza
  • 7,016
  • 15
  • 63
  • 96
0

I have an answer (I admit somewhat confusing answer) that involvessed, regular expressions and capture groups:

  • \S* - first word
  • \s* - delimiter
  • (\S*) - second word - captured
  • .* - rest of the line

As a sed expression, the capture group needs to be escaped, i.e. \( and \).

The \1 returns a copy of the captured group, i.e. the second word.

$ echo "alpha beta gamma delta" | sed 's/\S*\s*\(\S*\).*/\1/'
beta

When you look at this answer, its somewhat confusing, and, you may think, why bother? Well, I'm hoping that some, may go "Aha!" and will use this pattern to solve some complex text extraction problems with a single sed expression.

Stephen Quan
  • 21,481
  • 4
  • 88
  • 75