0

I have a hash file, example.md5, full of hashes and file names similar to the following.

e5dbb7657f770fad038220f5c69d806c  backup/example/test.txt

How could I batch edit that file to instead look like.

e5dbb7657f770fad038220f5c69d806c  example/test.txt

I just want to remove the first part of each file path mentioned in the hash file.

EDIT: includes some numbers in file paths, i.e. e5dbb7657f770fad038220f5c69d806c 750g/example/test.txt

  • You can use `awk` as well: `awk '{split($2,a,"/"); $2=a[2]"/"a[3]}1' file`, $2 means the second column and the default field separator(FS) is " ". – Awin Jan 11 '19 at 07:01

2 Answers2

2

You can use below sed

[root@967dd7743677 test]# sed 's:[a-z]*/::' hashfile
e5dbb7657f770fad038220f5c69d806c  example/test.txt
[root@967dd7743677 test]#
Raja G
  • 5,973
  • 14
  • 49
  • 82
  • `s:[^/]*/::` instead of `s:[a-z]*/::` would work even if the directory names contained digits, uppercase letters or other characters... – Dario Jan 11 '19 at 07:36
  • 1
    @Dario, but it will omit column1, seems op wanted along with column1. [root@967dd7743677 test]# sed 's:[^/]*/::' hashfile example/test.txt [root@967dd7743677 test]# – Raja G Jan 11 '19 at 08:18
  • `sed 's:[0-9][a-z]*/::' hashfile` Seems to have worked here. Some lines had numbers in the first part of the path, i.e. 3tb/backup/test.txt – RansuDoragon Jan 11 '19 at 08:52
  • No actually that didn't work, sorry. It seems to have tripped over lines like `750gb/backup/test.txt` turning it into `75/backup/test.txt` – RansuDoragon Jan 11 '19 at 08:59
  • You want to combine that into a single class - `sed 's:[-0-9a-z_]*/::' hashfile` (I added dash and underscore for good measure). – tripleee Jan 11 '19 at 11:10
1

With GNU sed:

sed -E -n 's/([a-z0-9]+)( *)[A-Za-z0-9]+\/(.*)/\1\2\3/p' file_name

Output:

e5dbb7657f770fad038220f5c69d806c  example/test.txt

Explanation:

-E : --regexp-extended(using extended regular expressions)

-n : --quiet, --silent, suppress automatic printing of pattern space

([a-z0-9]+) : Capturing first group containing the hash characters which includes a combination of one or more small alphabets and numbers

( *) : capturing second group containing whitespaces

[A-Za-z0-9]+\ : matching the first part of second column which may be a combination of one or more small alphabets,numbers and capital letters and a \


(.*) : cpaturing the third group containing any character.

\1\2\3 : backreferencing the first,second and third captured groups.

/p : to print
User123
  • 1,498
  • 2
  • 12
  • 26
  • Is there a missing single quote? That command as is will not run on my system – RansuDoragon Jan 11 '19 at 08:54
  • this trips over lines with numbers. This line `365846d4a15ab600d77f46b2f37f5764 750gb/test.log.txt` became `365846d4a15ab600d77f46b2f37f5764 750gtest.log.txt` – RansuDoragon Jan 11 '19 at 09:12
  • @RansuDoragon: As per your sample input in the question , i thought the first part in second column would contain only small alphabets, have updated my answer now. – User123 Jan 11 '19 at 09:18
  • 1
    I'll update that. I should have been more clear, sorry. Thank you for the help, above sed command works for me. I should also probably try to learn what I can about sed and the like since I am getting more into command line stuff. – RansuDoragon Jan 11 '19 at 10:31
  • This is just a too-complex variation of rɑːdʒɑ's answer. – tripleee Jan 11 '19 at 11:11
  • @tripleee: yeah, me too found rɑːdʒɑ's answer easy :) – User123 Jan 11 '19 at 11:17