52

I have a set of data as input and need the second last field based on deleimiter. The lines may have different numbers of delimiter. How can I get second last field ?

example input

text,blah,blaah,foo
this,is,another,text,line

expected output

blaah
text
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
Archit Jain
  • 2,154
  • 1
  • 18
  • 32

6 Answers6

91

Got a hint from Unix cut except last two tokens and able to figure out the answer :

cat datafile | rev | cut -d '/' -f 2 | rev
Community
  • 1
  • 1
Archit Jain
  • 2,154
  • 1
  • 18
  • 32
  • 4
    +1 for not using awk while still being concise...although I also use awk most of the time :P – icasimpan Aug 13 '14 at 09:25
  • 1
    `rev` actually can take a file as argument, so this is UUoC case – Sergiy Kolodyazhnyy Aug 20 '16 at 00:59
  • Preserving a linear order is not nor will ever be "UUoC". @SergiyKolodyazhnyy – Jan Kyu Peblik Sep 05 '19 at 19:35
  • @JanKyuPeblik Please explain. What is the benefit of cat and having two processes with additional buffering via pipeline instead of just having one rev process that achieves same result as two ? – Sergiy Kolodyazhnyy Sep 05 '19 at 21:41
  • @JanKyuPeblik Sorry, but it still is unclear. "Preserving a linear order" doesn't seem to be necessary here, especially since the answer suggest they are in fact reversing the lines first for processing and reversing again. `cat datafile | rev` has no visible benefit over `rev datafile` – Sergiy Kolodyazhnyy Sep 05 '19 at 23:05
45

Awk is suited well for this:

awk -F, '{print $(NF-1)}' file

The variable NF is a special awk variable that contains the number of fields in the current record.

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
6

There's no need to use cut, rev, or any other tools external to bash here at all. Just read each line into an array, and pick out the piece you want:

while IFS=, read -r -a entries; do
  printf '%s\n' "${entries[${#entries[@]} - 2]}"
done <file

Doing this in pure bash is far faster than starting up a pipeline, at least for reasonably small inputs. For large inputs, the better tool is awk.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 6
    I wouldn't say no reason.. That's a lot of nasty syntax for a simple task and I'd take a few extra nanoseconds personally. Anyway, +1 for giving a robust bash solution. – Chris Seymour Jul 14 '13 at 21:59
  • 4
    Saying there's no reason to use an external tool when you can use bash constructs is like saying there's no reason to use a lawnmower when you can use scissors. A shell is a just an environment from which to call tools and manipulate files and processes along with some constructs to sequence all of that. Like with any other form of construction, when constructing software just use the right tool for each job. – Ed Morton Jul 15 '13 at 00:57
  • 1
    @EdMorton That may be a nice sound bite, but it doesn't actually line up with the world as it is. bash is a fairly complete programming environment, and provides the tools necessary to do most common operations in-process. You wouldn't write Python code that calls external tools for operations Python has built in; why do so in bash? – Charles Duffy Jul 15 '13 at 01:41
  • 1
    @EdMorton ...to go a little deeper: this isn't your grandpa's Bourne shell. bash has proper arrays (of C strings), map/hash datatypes, indirect variable references. 40 years ago, a shell might have been a tool that did nothing but set up pipelines, but now ain't then. – Charles Duffy Jul 15 '13 at 01:44
  • Just look at how much more code and how much more complicated and detailed the code was you had to write compared to @sudo_Os trivial awk print statement. That's because awk was designed following the UNIX principle of doing one thing (in this case text processing) and doing it well. All of the modern shell constructs are very useful to help in accomplishing certain shell-appropriate tasks but that does not mean that you should use them to do the tasks that there's already clearly much better-suited tools for. – Ed Morton Jul 15 '13 at 02:58
  • @EdMorton I can't agree with that -- awk is a full programming language the same way bash is (with some nice features bash doesn't have, such as automated file descriptor management); people who treat it as a tool that does just one thing are wrong about awk in the same way you're wrong about bash. – Charles Duffy Jul 15 '13 at 03:25
  • Remind me again what the awk command is to move a file? The awk command to wait for a process to end before proceeding? How about the shell variable you set to read multi-line records? The shell builtin variable that tells you how many lines you've read across all input files? By considering awk to be "full programming language the same way bash is" you are missing the point of both awk and shell and therefore the benefits of using both appropriately. – Ed Morton Jul 15 '13 at 03:36
  • @EdMorton Awk has support for renaming files to the exact same extent shell does -- both call `/bin/mv`. Modern (GNU) awk _does_ directly support `wait()` and `waitpid()` calls. Shell implicitly supports multi-line records via multiple read invocations. Shell certainly lacks the built-in text-processing helpers specific to awk -- but doing so makes it a better general-purpose language by being free of needless special cases. – Charles Duffy Jul 15 '13 at 03:50
  • @EdMorton I actually _do_ believe it using the right tool for the right job -- in my day job, I bounce between about 5 languages, 4 editors/IDEs, etc. That said, I also see far, far too many people who aren't willing to learn enough shell to recognize the places where it _is_ the right tool for a niche... to the point where it's become something of a peeve. I'd rather teach someone the shell way to do something and have them choose something else in practice than let them think shell couldn't do that thing at all. – Charles Duffy Jul 15 '13 at 03:52
  • Revised "no reason" to "no need". Folks happier now? – Charles Duffy Jul 15 '13 at 03:54
  • As an academic exercise, it's fine. As you mentioned "Doing this in pure bash is far faster than starting up a pipeline, at least for reasonably small inputs. For large inputs, the better tool is awk.". I tested and the bash solution ran in 0.095s vs awk in 0.125s for a 100-line file but 0.157s vs 0.125s for a 500-line file and 1.344s vs 0.125s for a 10,000-line file. – Ed Morton Jul 15 '13 at 09:59
  • 4
    To paraphrase then - for inputs that it'd literally take the blink of an eye to process in awk, you can do it in a slightly briefer blink of an eye using bash but then be prepared for a severe performance hit as your data grows. So the bash solution is more cumbersome to write than awk and runs far slower than awk in situations where performance is a actually something you'd care about (i.e. on large data sets). Best I can tell, then, there's no reason to write it in bash other than just as an academic exercise just to show people how to use the bash constructs. – Ed Morton Jul 15 '13 at 10:07
  • @EdMorton Half the time I see people using awk, they're starting it once per line, or they're starting other subprocesses with its output, or they're reading awk's output into a bash loop. At that point, awk either become useless overhead (cases 1 and 3), or the performance gain gets lost in the noise (case 2). – Charles Duffy Jul 15 '13 at 12:09
  • ...moreover -- if the bash code in question is ugly to you, it's a signal that you don't know enough bash to use your shell effectively. Arrays are utterly necessary to be able to safely form and manipulate command-line argument lists (there's literally no other safe way to do it for arbitrary inputs other than `eval`ing `printf %q` output or overwriting the `$@` built-in array, and either of those are horrible practice), and `read` is fundamental to processing input; there's no black magic here. – Charles Duffy Jul 15 '13 at 12:13
  • @EdMorton Aside: `ksh` supports the features used here (though I can't vouch that syntax matches), but has performance more on-par with `awk`. If you're benchmarking, it might be fun to play with. – Charles Duffy Jul 15 '13 at 12:17
  • I didn't say the code was ugly, I said it was cumbersome in that you have to write, relatively, a lot of detailed code just to do this very trivial job. I understand bash just fine and I use arrays, etc. in it WHEN APPROPRIATE such as when creating/destroying files and/or processes. – Ed Morton Jul 15 '13 at 13:37
  • I already granted that point (that awk provides helpers for making line-oriented text-processing jobs easier... or, rather, more terse). I'm not convinced it's a good thing -- having to learn a bunch of little special-purpose helpers is more cognitive load than learning and leveraging a set of general-purpose tools, and the amount of benefit those special-purpose helpers need to provide to be worth learning is a point on which we appear to disagree -- but their presence and use is fully agreed by both parties, and thus rearguing it bears no purpose. – Charles Duffy Jul 15 '13 at 13:51
  • wrt my claim that I do use bash arrays, see for example the answer I posted a couple of days ago: http://stackoverflow.com/questions/17628152/trying-to-assign-output-of-awk-command-to-an-array/17640291#17640291. In that case IMHO it's probably not appropriate use of them either, as I say in my answer, but it answers the OPs question of how to read awk output into an array. – Ed Morton Jul 15 '13 at 14:14
  • @EdMorton *nod*. Apologies if I came off as leveraging personal attacks on yourself or your skillset; that wasn't the intent. – Charles Duffy Jul 15 '13 at 14:15
  • @EdMorton ...though actually, reading through that answer, the bash side of it is pretty substantially buggy. I'll follow up there. – Charles Duffy Jul 15 '13 at 14:16
  • No problem, I didn't take it personally, I just didn't want anyone reading this to think I was advocating one approach because I didn't understand the alternative. Please do follow up there, I suspect we've about exhausted everyone else's patience with us in this thread anyway! – Ed Morton Jul 15 '13 at 14:18
4

Perl solution similar to awk solution from @iiSeymour

perl -lane 'print $F[-2]' file

These command-line options are used:

  • n loop around every line of the input file, do not automatically print every line

  • l removes newlines before processing, and adds them back in afterwards

  • a autosplit mode – split input lines into the @F array. Defaults to splitting on whitespace

  • e execute the perl code

The @F autosplit array starts at index [0] while awk fields start with $1
-1 is the last element
-2 is the second to last element

Chris Koknat
  • 3,305
  • 2
  • 29
  • 30
3

The most minimalist answer to this problem is to use my cuts utility:

$ cat file.txt
text,blah,blaah,foo
this,is,another,text,line

$ cuts -2 file.txt
blaah
text

cuts, which stands for "cut on steroids":

- automatically figures out the input field separators
- supports multi-char (and regexp) separators
- automatically pastes (side-by-side) multiple columns from multiple files
- supports negative offsets (from end of line)
- has good defaults to save typing + allows the user to override them

and much more.

I wrote cuts after being frustrated with the too many limitations of cut on Unix. It is designed to replace various cut/paste combos, slicing and dicing columns from multiple files, with multiple separator variations, while imposing minimal typing from the user.

You can get cuts (free software, Artistic Licence) from github: https://github.com/arielf/cuts/

Calling cuts without arguments will print a detailed Usage message.

arielf
  • 5,802
  • 1
  • 36
  • 48
  • Hey thanks for sharing your script! Perhaps "minimalist" isn't the best way to describe it as one needs to install a perl script, but it is definitely useful to have something intelligent like this that embraces the UNIX philosophy. I'm going to stash this in my utils... – Steven Lu Feb 25 '15 at 13:24
2

Code for GNU :

$ echo text,blah,blaah,foo|sed -r 's/^(\S+,){2}(\S+),.*/\2/'
blaah

$ echo this,is,another,text,line|sed -r 's/^(\S+,){2}(\S+),.*/\2/'
text

Code example similar to sudo_O's awk code:

$ sed -r 's/.*,(\w+),\w+$/\1/' file
blaah
text

It might be better to use more specialised programs for CSV files, eg. or .

Community
  • 1
  • 1
captcha
  • 3,756
  • 12
  • 21
  • 2
    That doesn't get the penultimate field and its limited to a fixed numbers of fields per line. I wouldn't use regular expression for this. – Chris Seymour Jul 14 '13 at 23:03