0

I have generated a set of filepaths as strings in a bash script, all of this form:

./foo/bar/filename.proto

There can be any number of subfolders/slashes, but they all have the .proto extension.

I want to trim the leading ./ and trailing filename.proto to transform them to look like this:

foo/bar

I have had a surprising amount of difficulty adapting this from other solutions and debugging it. I have tried:

grep -Po "\.\/(.*)\/[^\/]+\.proto"

and

sed -n 's/\.\/\(.*\)\/[^\/]+\.proto/\1/p'

I have tried sed with both escaped and unescaped parentheses. For reference, I am currently working on a mac, and would like the most cross-platform-compatible solution.

I could do this fairly easily in Python, but I want to avoid the complexity of calling another script to do this.

To give you an idea of how this is working, my full script looks like this (so far):

#!/bin/bash
consume_single_folder () {
  do_stuff $1
}

find . -name \*.proto|while read fname; do
  echo "$fname" |sed -n 's/\.\/\(.*\)\/[^\/]+\.proto/\1/p' | consume_single_folder
done

Any help is appreciated. Thanks!

EDIT:

To be clear, I have tested my regex on regex101.com and it seems to look alright:

\.\/(.*)\/[^\/]+\.proto

It should be greedy, capturing everything between the first and last slash.

A. Davidson
  • 397
  • 1
  • 4
  • 14

2 Answers2

1

Looks like dirname could help you:

$ dirname "./foo/bar/filename.proto"
./foo/bar

With leading ./ removal:

$ dirname "./foo/bar/filename.proto"  | sed "s/\.\///g"
foo/bar

Also you could add sort | uniq avoid duplicates:

find . -name \*.proto|while read fname; do
  echo "$fname" | xargs dirname | sed "s/\.\///g" | consume_single_folder
done

Works on MacOS and Linux

shuvalov
  • 4,713
  • 2
  • 20
  • 17
1

Please do not use sites like regex101 for testing sed regular expression - syntax and features vary a lot between tools, as well as between various implementations.. See Why does my regular expression work in X but not in Y? and differences between various sed implementations

For your given example, changing + to * will work (lookup differences between BRE and ERE)

$ fname='./foo/bar/filename.proto'
$ echo "$fname" | sed -n 's/\.\/\(.*\)\/[^\/]*\.proto/\1/p'
foo/bar
$ # or use a different delimiter
$ echo "$fname" | sed 's|\./\(.*\)/[^/]*\.proto|\1|'
foo/bar
$ # further simplification as find already filters by extension
$ echo "$fname" | sed 's|\./\(.*\)/.*|\1|'
foo/bar

Also, I would suggest to read Why is looping over find's output bad practice? and change your find syntax accordingly

Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    Thanks for your solution and for your links! I especially liked the idea to use a different delimiter. I decided to go with the approach that you mentioned in the other answer. – A. Davidson Aug 01 '18 at 21:56