0

The goal is to use sed to return only the url from each line of FF extension Mining Blocker which uses this format for its regex lines:

{"baseurl":"*://002.0x1f4b0.com/*", "suburl":"*://*/002.0x1f4b0.com/*"},
{"baseurl":"*://003.0x1f4b0.com/*", "suburl":"*://*/003.0x1f4b0.com/*"},

the result should be:

002.0x1f4b0.com
003.0x1f4b0.com

One way would be to keep everything after suburl":"*://*/ then remove each occurrence of /*"},

I found https://unix.stackexchange.com/questions/24140/return-only-the-portion-of-a-line-after-a-matching-pattern but the special characters are a problem.

this won't work:

sed -n -e s@^.*suburl":"*://*/@@g hosts

Would someone please show me how to mark the 2 asterisks in the string so they are seen by regex as literal characters, not wildcards?

edit:

sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' hosts

doesn't work, unfortunately.

regarding character substitution, thanks for directing me to the references.

I reduced the searched-for string to //*/ and used ASCII character codes like this:

sed -n -e s@^.*\d047\d047\d042\d047@@g hosts

Unfortunately, that didn't output any changes to the lines.

My assumptions are:

^.*something specifies everything up to and including the last occurrence of "something" in a line

sed -n -e s@search@@g deletes (replace with nothing) "search" within a line

So, this line:

sed -n -e s@^.*\d047\d047\d042\d047@@g hosts

Should output everything after //*/ in each line...except it doesn't.

What is incorrect with that line?

Regarding deleting everything including and after the first / AFTER that first operation, yes, that's wanted too.

  • Have a look at the manual: https://www.gnu.org/software/sed/manual/sed.html#Escapes - you have to escape the special characters, `\*`. – Benjamin W. Jul 24 '18 at 05:06
  • Possible duplicate of [What special characters must be escaped in regular expressions?](https://stackoverflow.com/questions/399078/what-special-characters-must-be-escaped-in-regular-expressions) – Benjamin W. Jul 24 '18 at 05:09

1 Answers1

0

This might work for you (GNU sed):

sed -n 's#.*://\*/\([^/]\+\)/.*#\1#p' file

Match greedily (the longest string that matches) all characters up to ://*/, followed by a group of characters (which will be referred to as \1) that do not match a /, followed by the rest of the line and replace it by the group \1.

N.B. the sed substitution delimiters are arbitrary, in this case chosen to be # so as make pattern matching / easier. Also the character * on the left hand side of the substitution command may be interpreted as a meta character that means zero or more of the previous character/group and so is quoted \* so that it does not mistakenly exert this property. Finally, using the option -n toggles off the usual printing of every thing in the pattern space after all the sed commands have been executed. The p flag on the substitution command, prints the pattern space following a successful substitution, therefore only URL's will appear in the output or nothing.

potong
  • 55,640
  • 6
  • 51
  • 83