0

I'm coding a bash script in order to generate .spec file (building RPM) automatically. I read all the files in the directory (which I hope to convert it into rpm package) and write all the paths of files needed to install in .spec file, I realize that I need to shorten them. An example:

/tmp/a/1.jpg
/tmp/a/2.conf
/tmp/a/b/srf.cfg
/tmp/a/b/ssp.cfg
/tmp/a/conf_web_16.2/c/.htaccess
/tmp/a/conf_web_16.2/c/.htaccess.WebProv
/tmp/a/conf_web_16.2/c/.htprofiles

=> What I want to get:

/tmp/a/*.jpg
/tmp/a/*.conf
/tmp/a/b/*.cfg
/tmp/a/conf_web_16.2/c/*
/tmp/a/conf_web_16.2/c/*.WebProv

You guys please give me some advice about my problem. I hope you guys can suggest your idea in bash shell, python or C. Thank you in advance.

illus
  • 27
  • 6
  • `/tmp/a/conf_web_16.2/c/*` already covers all files in that directory; why do you want a separate entry for `/tmp/a/conf_web_16.2/c/*.WebProv`? – tripleee Nov 22 '17 at 07:22
  • @tripleee: Ah, I will filter these again and just remain some extensions needed. And I think I will need some bash code like this in future. – illus Nov 22 '17 at 08:24

1 Answers1

0

To convert any file name which contains a dot in a character other than the first into a wildcard covering the part up to just before the dot, and any remaining files to just a wildcard,

sed -e 's%/[^/][^/]*\(\.[^./]*\)$%/*\1%' -e t -e 's%/[^/]*$%/*%'

The behavior of sed is to read its input one line at a time, and execute the script of commands on each in turn. The s%foo%bar% substitution command replaces a regex match with a string, and the t command causes the script to skip further substitutions if one was already performed on the current line. (I'm simplifying somewhat.) The first regex matches file names which contain a dot in a position other than the first, and captures the match from the dot through the end in a back reference which is used in the substitution as well (that's the \1). The second is applied to any remaining file names, because of the t command in between.

The result will probably need to be piped to sort -u to remove any duplicates.

If you don't have a list of the file names, you can use find to pipe in a listing.

find . -type f | sed ... | sort -u
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I have some folders which have "." in their names, so your solution shorten their names instead of the file name. For ex: /tmp/a/./c/.htaccess but not /tmp/a/conf_web_16.2/c/* – illus Nov 22 '17 at 08:35
  • I get `/tmp/a/./c/*` for the example you provided, is that not correct? – tripleee Nov 22 '17 at 08:44
  • No, I need to get /tmp/a/conf_web_16.2/c/* as I posted before. I give an example in my post, can you give me another suggestion? – illus Nov 22 '17 at 08:52
  • I get that output for the input `/tmp/a/conf_web_16.2/c/.htaccess` and related examples. What input do you get the incorrect output for? – tripleee Nov 22 '17 at 08:56
  • I used the same input like what I ask you too. Cause I have a file which recorded all of these paths, so I use cat command to read content in this text file, after that I use your sed command. Do I make sth wrong? – illus Nov 22 '17 at 09:04
  • The `cat` is [useless](https://stackoverflow.com/questions/11710552/useless-use-of-cat) as such, `sed` can read a file name just fine. I cannot repro what you are reporting -- see transcript here: https://pastebin.com/iESYjXGT – tripleee Nov 22 '17 at 09:18
  • I made a minor edit after creating the paste, but it does not change the functionality. – tripleee Nov 22 '17 at 09:20
  • I test your code directly in CLI and get the result what I hope. I will try it when reading my file, maybe I got some mistakes when reading a file. Thanks for your time. – illus Nov 22 '17 at 09:29
  • I post tripleee's new answer in here if someday pastebin delete tripleee's code: sed -e 's%/[^/][^/]*\(\.\([^./]*\)\)$%/*\1%' -e t -e 's%/[^/]*$%/*%' – illus Nov 22 '17 at 09:42
  • That's the old answer. I removed the redundant parentheses because they were unnecessary. The edit history of my answer is available by clicking on the "edited xx ago" note next to my name. There is no reason to post that as a comment (and posting code in comments is often not a good idea, even if you use `code` formatting to protect special characters etc). – tripleee Nov 22 '17 at 09:44