95

What command, or collection of commands, can I use to return all file extensions in a directory (including sub-directories)?

Right now, I'm using different combinations of ls and grep, but I can't find any scalable solution.

jww
  • 97,681
  • 90
  • 411
  • 885
Matthew
  • 3,510
  • 2
  • 23
  • 24

9 Answers9

141

How about this:

find . -type f -name '*.*' | sed 's|.*\.||' | sort -u
Martin Tournoij
  • 26,737
  • 24
  • 105
  • 146
thkala
  • 84,049
  • 23
  • 157
  • 201
  • find [this directory] (files) (matching any name with an extension) | use sed to substitute anything preceding a period with nothing | sort with unique flag – Matthew Apr 13 '18 at 12:44
  • But this does'nt go into sub-directories. – 25b3nk Mar 28 '19 at 13:08
  • 1
    @BhaskarChakradhar Yes it does. What makes you think it doesn't? – Michael May 02 '19 at 16:19
  • thank you, this is very useful, i'm using this on chromium source code directory and got thousands of file extensions, many of them are actually files without file extension, is there anyway to ignore all files without file extension? – jerry Mar 06 '21 at 08:32
  • 1
    This also gets all "dot files", like e.g. `.ctags`, which is usually not what you want. – DevSolar Mar 29 '22 at 12:26
  • Can I suggest changing the " -name '*.* '" to "-name '[^.]*.*' as the former also picks up lots of "invisible" files such has temporary edits produced by xemacs? – Simon F Aug 16 '22 at 09:31
  • @DevSolar to exclude dot files you can indicate you want at least one chat before the dot using the question mark, ``-name '?*.*'``. Glob matching is less powerful than regex but much faster. – Meitham Oct 10 '22 at 07:16
10

list all extensions and their counts of current and all sub-directories

ls -1R | sed 's/[^\.]*//' | sed 's/.*\.//' | sort | uniq -c
mindon
  • 318
  • 4
  • 13
8
find . -type f | sed 's|.*\.||' | sort -u

Also works on mac.

marcosdsanchez
  • 2,529
  • 2
  • 17
  • 20
  • This solution doesn't ensure all files listed _have_ extensions, so files without them aren't fixed by sed and are treated _as_ extensions. – Matthew Apr 13 '18 at 12:47
3

Another one, similar to others but only uses two programs (find and awk)

find ./ -type f -name "*\.*" -printf "%f\n" | awk -F . '!seen[$NF]++ {print $NF}'

-type f restricts it to just files, not directories

-name "*\.*" ensures the filename has a . in it.

-printf "%f\n" prints just the filename, not the path to the filename.

-F . makes awk utilize a period as the field separator.

$NF is the last field, separated by periods.

!seen[$NF]++ evaluates to true the first time an extension is encountered, and false every other time it is encountered.

print $NF prints the extension.

Rusty Lemur
  • 1,697
  • 1
  • 21
  • 54
1

if you are using Bash 4+

shopt -s globstar
for file in **/*.*
do
  echo "${file##*.}
done

Ruby(1.9+)

ruby -e 'Dir["**/*.*"].each{|x|puts x.split(".")[-1]}' | sort -u
kurumi
  • 25,121
  • 5
  • 44
  • 52
  • For me using `MSYS2`, the pattern `"${file##*.}"` will only print the final part of extensions with two dots (for example it only prints `.gz` when the extension is `.tar.gz`). The pattern `"${file#*.}` prints every part of the extension. – Alex Hall Aug 30 '20 at 01:39
0

Boooom another:

find * | awk -F . {'print $2'} | sort -u
ackuser
  • 5,681
  • 5
  • 40
  • 48
  • 1
    `echo 'gniourf.tar.gz' | awk -F . {'print $2'}` gives `tar` and `echo 'one.two.three.pdf' | awk -F . {'print $2'}` gives `two`. Are you sure your approach is the good one? – gniourf_gniourf Jun 27 '14 at 13:22
  • I think the above solution is a simple solution, here I put another find . -type f -name "*.*" | awk -F. '!a[$NF]++{print $NF}' . I don't think with a simple commands you can't get every type of file. As you said before there are some problems parsing every row, so in this case I am sure is better to use some scripts in python, perl or similar in which you won't have this problem. Anyway I put a simple solution, if you now the extension of the files you can filter with a grep like | grep 'txt\|png\|pdf'. Thanks – ackuser Jun 30 '14 at 09:20
0
ls -1 | sed 's/.*\.//' | sort -u

Update: You are correct Matthew. Based on your comment, here is an updated version:

ls -R1 | egrep -C 0 "[^\.]+\.[^\./:]+$" | sed 's/.*\.//' | sort -u

TimeDelta
  • 401
  • 3
  • 13
  • 1
    This has two problems. First it only works for a flat directory, but misses subdirectories. Secondly, it includes all files without extensions in the output. – Matthew Oct 09 '14 at 13:11
  • [Don't parse the output of `ls`](http://mywiki.wooledge.org/ParsingLs), especially when it's useless. – gniourf_gniourf Jul 03 '15 at 17:14
  • You really should use ripgrep instead of egrep if you have time to install it first: https://github.com/BurntSushi/ripgrep and the updated command would be: `ls -R1 | rg -C 0 "[^\.]+\.[^\./:]+$" | sed 's/.*\.//' | sort -u` I get 10x at least improvement for huge folders. – james-see Dec 29 '18 at 22:24
0

I was just quickly trying this as I was searching Google for a good answer. I am more Regex inclined than Bash, but this also works for subdirectories. I don't think includes files without extensions either:

ls -R | egrep '(\.\w+)$' -o | sort | uniq -c | sort -r

Mehcs85
  • 37
  • 2
  • 5
0

Yet another solution using find (that should even sort file extensions with embedded newlines correctly):

# [^.]: exclude dotfiles
find . -type f -name "[^.]*.*" -exec bash -c '
  printf "%s\000" "${@##*.}"
' argv0 '{}' + |
sort -uz | 
tr '\0' '\n'
tooly
  • 1