bash shell script to find the closest parent directory of several files

Question

Suppose the input arguments are the FULL paths of several files. Say,

/abc/def/file1
/abc/def/ghi/file2
/abc/def/ghi/file3

How can I obtain the directory name /abc/def in a bash shell script?
How can I obtain only file1, /ghi/file2, and /ghi/file3?

Jonathan Leffler · Accepted Answer · 2015-01-24T19:14:53.723

Given the answer for part 1 (the common prefix), the answer for part 2 is straight-forward; you slice the prefix off each name, which could be a done with sed amongst other options.

The interesting part, then, is finding the common prefix. The minimum common prefix is / (for /etc/passwd and /bin/sh, for example). The maximum common prefix is (by definition) present in all the strings, so we simply need to split one of the strings into segments, and compare possible prefixes against the other strings. In outline:

split name A into components
known_prefix="/"
for each extra component from A
do
    possible_prefix="$known_prefix/$extra/"
    for each name
    do
        if $possible_prefix is not a prefix of $name
        then ...all done...break outer loop...
        fi
    done
    ...got here...possible prefix is a prefix!
    known_prefix=$possible_prefix
done

There are some administrivial details to deal with, such as spaces in names. Also, what is the permitted weaponry. The question is tagged bash but which external commands are allowed (Perl, for example)?

One undefined issue — suppose the list of names was:

/abc/def/ghi
/abc/def/ghi/jkl
/abc/def/ghi/mno

Is the longest common prefix /abc/def or /abc/def/ghi? I'm going to assume that the longest common prefix here is /abc/def. (If you really wanted it to be /abc/def/ghi, then use /abc/def/ghi/. for the first of the names.)

Also, there are invocation details:

How is this function or command invoked?
How are the values returned?
Is this one or two functions or commands (longest_common_prefix and 'path_without_prefix`)?

Two commands are easier:

prefix=$(longest_common_prefix name1 [name2 ...])
suffix=$(path_without_prefix /pre/fix /pre/fix/to/file [...])

The path_without_prefix command removes the prefix if it is present, leaving the argument unchanged if the prefix does not start the name.

longest_common_prefix

longest_common_prefix()
{
    declare -a names
    declare -a parts
    declare i=0

    names=("$@")
    name="$1"
    while x=$(dirname "$name"); [ "$x" != "/" ]
    do
        parts[$i]="$x"
        i=$(($i + 1))
        name="$x"
    done

    for prefix in "${parts[@]}" /
    do
        for name in "${names[@]}"
        do
            if [ "${name#$prefix/}" = "${name}" ]
            then continue 2
            fi
        done
        echo "$prefix"
        break
    done
}

Test:

set -- "/abc/def/file 0" /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 "/abc/def/ghi/file 4"
echo "Test: $@"
longest_common_prefix "$@"
echo "Test: $@" abc/def
longest_common_prefix "$@" abc/def
set --  /abc/def/ghi/jkl /abc/def/ghi /abc/def/ghi/mno
echo "Test: $@"
longest_common_prefix "$@"
set -- /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
echo "Test: $@"
longest_common_prefix "$@"
set -- "/a c/d f/file1" "/a c/d f/ghi/file2" "/a c/d f/ghi/file3"
echo "Test: $@"
longest_common_prefix "$@"

Output:

Test: /abc/def/file 0 /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 /abc/def/ghi/file 4
/abc/def
Test: /abc/def/file 0 /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3 /abc/def/ghi/file 4 abc/def
Test: /abc/def/ghi/jkl /abc/def/ghi /abc/def/ghi/mno
/abc/def
Test: /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
/abc/def
Test: /a c/d f/file1 /a c/d f/ghi/file2 /a c/d f/ghi/file3
/a c/d f

path_without_prefix

path_without_prefix()
{
    local prefix="$1/"
    shift
    local arg
    for arg in "$@"
    do
        echo "${arg#$prefix}"
    done
}

Test:

for name in /pre/fix/abc /pre/fix/def/ghi /usr/bin/sh
do
    path_without_prefix /pre/fix $name
done

Output:

abc
def/ghi
/usr/bin/sh

Note that this solution assumes the path names are absolute. If the path names are relative (i.e. `./abc/def/file1` and `./abc/def/file2/`), longest_common_prefix will fail with an infinite loop. This can be easily fixed to accommodate relative paths by changing the condition `[ "$x" != "/" ]` to `[ "$x" != "/" -a "$x" != "." ]`. — Edward, Apr 17 '16 at 21:39

Idelic · Answer 2 · 2012-09-09T22:04:24.063

A more "portable" solution, in the sense that it doesn't use bash-specific features: First define a function to compute the longest common prefix of two paths:

function common_path()
{
  lhs=$1
  rhs=$2
  path=
  OLD_IFS=$IFS; IFS=/
  for w in $rhs; do
    test "$path" = / && try="/$w" || try="$path/$w"
    case $lhs in
      $try*) ;;
      *) break ;;
    esac
    path=$try
  done
  IFS=$OLD_IFS
  echo $path
}

Then use it for a long list of words:

function common_path_all()
{
  local sofar=$1
  shift
  for arg
  do
    sofar=$(common_path "$sofar" "$arg")
  done
  echo ${sofar:-/}
}

With your input, it gives

$ common_path_all /abc/def/file1 /abc/def/ghi/file2 /abc/def/ghi/file3
/abc/def

As Jonathan Leffler pointed out, once you have that, the second question is trivial.

score 2 · Answer 3 · answered Sep 11 '12 at 11:27

Here's one that's been shown to work with arbitrarily complex file names (containing newlines, backspaces and the like):

path_common() {
    if [ $# -ne 2 ]
    then
        return 2
    fi

    # Remove repeated slashes
    for param
    do
        param="$(printf %s. "$1" | tr -s "/")"
        set -- "$@" "${param%.}"
        shift
    done

    common_path="$1"
    shift

    for param
    do
        while case "${param%/}/" in "${common_path%/}/"*) false;; esac; do
            new_common_path="${common_path%/*}"
            if [ "$new_common_path" = "$common_path" ]
            then
                return 1 # Dead end
            fi
            common_path="$new_common_path"
        done
    done
    printf %s "$common_path"
}

score 1 · Answer 4 · edited May 23 '17 at 10:30

It seems to me that the solution below is much simpler.

As mentioned previously, only part 1 is tricky. Part 2 is straightforward with sed.

Part 1 can be cut into 2 subparts :

Finding the longest common prefix of all strings
Making sure this prefix is a directory, and if not trimming it to get the corresponding directory

It can be done with the following code. For the sake of clarity, this example uses only 2 strings, but a while loop gives you what you want with n strings.

LONGEST_PREFIX=$(printf "%s\n%s\n" "$file_1" "$file_2" | sed -e 'N;s/^\(.*\).*\n\1.*$/\1/')
CLOSEST_PARENT=$(echo "$LONGEST_PREFIX" | sed 's/\(.*\)\/.*/\1/')

which can of course be rewritten in just one line :

CLOSEST_PARENT=$(printf "%s\n%s\n" "$file_1" "$file_2" | sed -e 'N;s/^\(.*\).*\n\1.*$/\1/'  | sed 's/\(.*\)\/.*/\1/')

score -1 · Answer 5 · answered Sep 09 '12 at 16:44

-1

To get Parent's Directory:

  dirname /abc/def/file1

will give /abc/def

And to get the file name

   basename /abc/def/file1

will give file1

And According to your question to get only Closest Parent Directory name use

basename $(dirname $(/abc/def/file1))

will give def enter code here

answered Sep 09 '12 at 16:44

djadmin

1,742
3
19
27

The approach should be generic. To work for the given example, the answer can even be simpler=hardcoded: `echo /abc/def`, `echo file1`, ... – uvsmtid Aug 28 '23 at 14:52

bash shell script to find the closest parent directory of several files

5 Answers5

longest_common_prefix

path_without_prefix