2

There is a file that has results of rsync files listing:

drwxrwxrwx          4,096 2018/12/10 15:27:39 test/dir/one
drwxrwxrwx          4,096 2018/12/10 15:27:39 best/folder/two

how to use sed to get rid of everything besides paths?

wanted result:

test/dir/one
best/folder/two

I tried this regex: that works as it should for finding preceding of paths as base for sed but it did not have any effect when used:

cat listing.txt | sed 's/.*[0-9]+:[0-9]+:[0-9]+ //' | less

What am I missing?

Jimmix
  • 5,644
  • 6
  • 44
  • 71

2 Answers2

2

Your sed probably doesn't support the + repetition operator in this form. Try

sed 's/.*[0-9]\+:[0-9]\+:[0-9]\+ //' listing.txt

(which also does away with that pesky useless cat).

Recall that sed predates many of the frills of modern regex. Your sed might support an -r or -E flag to enable extended regex support (whioh is still far from the modern regex dialect many newcomers are most familiar with) but this is not portable.

Of course, if the listing uses a fixed field width, maybe simply try

cut -c47- listing.txt

(Not in a place where I can verify the precise number - play around with different values.)

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • 5
    `cut -c47-` there you go – Kavyesh Shah Apr 07 '19 at 18:33
  • Assuming that the block size date and time doesn't change, I guess the number `47` depends on the permissions which should be of fixed width. – sjsam Apr 07 '19 at 18:49
  • @tripleee Both of examples worked well. Not escaping `+` was the issue for `sed`. Now I recall I had the same issue when used regex into `VIM`. Don't know why I missed `cut`. It is so obvious now :). On 1000k lines `cut` had 2sec execution time vs `sed`'s 11sec. – Jimmix Apr 07 '19 at 19:42
  • The sed command assumes no file names contain spaces preceded by N:N:N where N is 1 or more digits. Since `\+` is a GNU extension and GNU sed also supports `-E` you could just use that. – Ed Morton Apr 07 '19 at 21:43
  • 1
    Personally, my first suggestion would be [col](http://man7.org/linux/man-pages/man1/col.1.html). Followed by [cut](http://man7.org/linux/man-pages/man1/cut.1.html). Then [sed](http://man7.org/linux/man-pages/man1/sed.1.html) or [awk](https://linux.die.net/man/1/awk). – paulsm4 Apr 07 '19 at 22:03
  • @EdMorton `\+` is a POSIX extension I believe. – tripleee Apr 08 '19 at 04:15
  • @triplee no, it's a GNU extension. See https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html: `\+ As *, but matches one or more. It is a GNU extension.` – Ed Morton Apr 08 '19 at 14:29
  • 1
    @EdMorton Thanks! I had imagined this extension was defined by [POSIX](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html#tag_20_116_13_02) just like `\{m,n\}` etc but ... TIL. – tripleee Apr 08 '19 at 14:44
  • Yeah looks like POSIX just defined `\{m,n\}` so then they didn't need to define `\+` and `\?` (as GNU has) since you can do them both with `\{1,\}` and `\{0,1\}` respectively. – Ed Morton Apr 08 '19 at 14:58
  • 1
    @paulsm4 can you please post a `col` answer? I just read the man page and can't imagine how it'd apply to this problem. – Ed Morton Apr 08 '19 at 15:03
0

This will work using any POSIX sed even if your file names contain blanks:

$ sed 's/\([^ ]* *\)\{4\}//' file
test/dir/one
best/folder/two

or any POSIX awk:

$ awk '{sub(/([^ ]* *){4}/,"")}1' file
test/dir/one
best/folder/two

If your file names can contain newlines then we should talk....

Ed Morton
  • 188,023
  • 17
  • 78
  • 185