0

I've been trying to get this working for a bit now... this question has helped, but I'm still struggling to get this to work. I'd prefer not to install homebrew, because this is a rare task that I'm performing right now.

I have a couple thousand files with a string of text, an underscore, more text, underscore and finally the important file name that I want to preserve. Dropping the first *_*_ and preserving the last part, with file extension, is attempted with \(*\)/\1/.

I've tried a couple different things and all I've got is the original filenames being spit out again. Any help is appreciated. - not sure if it's a regex issue, or sed, or probably a little of both.

ls | sed 's/^*_*_\(*\)/\1/' > ouput.txt;
ls | sed 's/^*_*_\(*\$\)/\1/' > out.txt
ls | sed 's/\(^*_*_\)\(*\$\)/\2/' > out.txt
ls | sed 's/\(^.*_+.*_+\)\(.*\$\)/mv & \2/' > out.txt
Community
  • 1
  • 1
Volvox
  • 611
  • 2
  • 7
  • 19
  • Could you show us an example of what the text looks like that you want to remove? – David Sep 12 '12 at 23:43
  • It is some pretty random junk... I just want to preserve the md5 hash instead of all the random appended data on the front. An example file would be: cyo (+ 1)_unkn_ac61eb3b4cc8c08a32625443cff9545e.txt – Volvox Sep 12 '12 at 23:45
  • @ Prince Wesley: I don't think that should ever be in the filename, if so it's such a low occurrence rate I'd be happy to search that out separately from the main batch. I had gotten a couple errors when I was running one of these attempts with the "+" and parenthesis, but I haven't seen that recently... maybe I just broke it really good. – Volvox Sep 12 '12 at 23:49
  • The fundamental problem is that you need to understand the difference between shell wildcards (glob patterns) and proper regular expressions. The asterisk means "any string" in glob, but in regex, it's a repetition operator which means "zero or more repetitions of the previous expression" and a lone dot is "any character". So the way to say "anything" in regex is `.*` and a lone `*` is just a syntax error. – tripleee Sep 13 '12 at 04:57

2 Answers2

2

Does this regex do the trick? If not, please report how its output is off (details) and I'll help you tune it.

ls -1 | sed -e 's/^[^_]*_[^_]*_//'

Note 1: You may want to use ls -1 to format the files into a single column.

Note 2: The approach above simply removes the unwanted part of your file names, rather than trying to store in a regex buffer the part that you do want.


EDIT

And here's a bash script that performs the rename.

for f in `ls -1`
do
    new_name=`echo "$f" | sed 's/^[^_]*_[^_]*_//'`
    mv "$f" "$new_name"
done

Can be written as a one-liner, but I went for clarity over brevity.

ron rothman
  • 17,348
  • 7
  • 41
  • 43
  • (Oops, initial answer had a copy/paste typo. Just fixed it.) – ron rothman Sep 12 '12 at 23:50
  • I should have been clearer... I was trying to rename files, similar to the linked question, in order to output it in the rename format seen in my last attempt. It's been a while since I've done *nix commands though, so if you have a different approach with this that'd be great too. #2 works great, #1 gives the same full filename as output. Ty! – Volvox Sep 13 '12 at 01:17
  • Ah, got it. Hang on a sec, let me see what I can come up with. (I'll also remove the first sed command in my answer since it's not working for you.) – ron rothman Sep 13 '12 at 01:18
  • I actually just found an app that will take the regex to find and select the text, then replace with whatever. Of course cl is probably much faster scalable for this. Thanks for your help here! – Volvox Sep 13 '12 at 01:29
  • Ah, cool, happy to hear you've got it working. I did add a rename script to my answer, just in case the app you found doesn't pan out. Cheers. – ron rothman Sep 13 '12 at 01:32
  • 1
    `ls` automatically switches to single column mode when attached to a pipe, try: `ls | cat`. – Thor Sep 13 '12 at 08:38
1

command | sed 's;^.*_;;' will do the trick. Use find command instead of ls.

For example,

 find . -type f | sed 's;^.*_;;' 
Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94
  • This assumes there are no underscores which should be preserved. The command `s;^[^_]*_[^_]*_;;` uses a more constrained regex which only replaces up through the second underscore. – tripleee Sep 13 '12 at 04:52