0

I need to process a couple of thousand PDF files sorted alphabietically on their filename ideally from bash. So from my simple perspective I need to walk a tree of files, stripping off path as I go and then do various grepping, sorting etc

Having seen an answer to a similar question I've tried doing a

tim@MERLIN:~/Documents/Scanned$ basename `find ./ -print`

but that gets messed up by some directory names which have spaces in them - e.g. there is one called General Letters which acts like a chicken-bone in the works and results in

basename: extra operand ‘Letters’
Try 'basename --help' for more information.

I can't see a way to get find to strip out the pathname and I would prefer to use find given its plethora of options to filter on age, size etc. Nor can I see any way to get basename to cope gracefully with spaces in this context.

I considered using cut but I can't work out how to get cut to give me the last field by doing something like cut -d/ <whatever> I'm sure there must be an easy way to do it: some sort of in-line sed or awk script?

I don't particularly want the buggeration of writing a perl/Python script to do it for me as I know I should be able to do it from the command line.

So any simple tips or suggestions?

Updated/Solved

Many thanks to Cyrus the solution is

tim@MERLIN:~/Documents/Scanned$ find . -name *.pdf -printf '%f\n' | sort
Community
  • 1
  • 1
TimGJ
  • 1,584
  • 2
  • 16
  • 32
  • Please update your question to provide a link to "answer #10124314". I think you mean [this question](http://stackoverflow.com/q/10124314/827263). – Keith Thompson Jul 19 '14 at 18:56
  • Proper direction recursion along with `pushd` and `popd` is what you probably need here. Just my 2 cents. – konsolebox Jul 19 '14 at 19:29

4 Answers4

4

Try this:

find ./ -printf '%f\n'


%f: File's name with any leading directories removed (only the last element).

Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • Perfect. That's exactly what I was after. `find . -name *.pdf -printf '%f\n' | sort` gets me pretty close to where I need to be. – TimGJ Jul 19 '14 at 21:28
1

Here is a working solution using awk:

find ./ | awk -F'/' '{ print $NF }';

It simply uses / as delimiter and prints the last value of the line.

Or with grep:

find ./ | grep -oE "[^/]+$"
julienc
  • 19,087
  • 17
  • 82
  • 82
0

Through sed,

find ./ | sed 's/.*\/\(.*\)$/\1/g'
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

If you want get a list of pathnames (recursively) but want sort them by filenames (not by path names) you can use:

find . -printf '%f|%p\n' | sort -k 1 -t'|' | cut -d'|' -f2-

You need a GNU find for this. (Linux ok, not default in OS X).

Without the GNU find, you can do the above with:

find . -print | sed 's:\(.*\)/\(.*\)$:\2\|\1/\2:' | sort -k 1 -t'|' | cut -d'|' -f2-

(Assuming there is no \n in the filenames)

clt60
  • 62,119
  • 17
  • 107
  • 194
  • In this instance the path name doesn't actually matter: I am looking for "holes" in the numbering scheme of the documents (as I suspect some have been deleted or stored somewhere bizarre or never saved). But thanks for the steer. – TimGJ Jul 19 '14 at 21:34