0

I've got a File full of URLs. Each line has one URL. I just want to keep the protocol and domain part.

Example:

https://example0.com/example.php?id=example0
https://example1.com/example.php?id=example1
https://example2.com/example.php?id=example2

Should be formatted to:

https://example0.com/
https://example1.com/
https://example2.com/

I'm using Linux Terminal, so Bash would be the best i think. I already heard of sed but i don't know how to use it or how to use expressions.

jww
  • 97,681
  • 90
  • 411
  • 885
Daniel
  • 11
  • 1

3 Answers3

1

With GNU sed:

sed -r 's|([^/]*//[^/]*/).*|\1|' file

Output:

    https://example0.com/
    https://example1.com/
    https://example2.com/

If you want to edit your file "in place" use sed's option -i.


See: The Stack Overflow Regular Expressions FAQ

Community
  • 1
  • 1
Cyrus
  • 84,225
  • 14
  • 89
  • 153
0

Try the following

https?:\/\/[^\/]+

https://regex101.com/r/8MdA6I/1

Maslo
  • 282
  • 1
  • 6
0

You could use cut like this:

cut -d/ -f1-3 yourfile

It uses / as delimiter and selects the fields 1 to 3 (// beeing the empty field 2).

And if you really need the trailing slash, you could pipe everything to sed to add a / by adding this to the command:

| sed "s+$+/+g" ` 
Lars Fischer
  • 9,135
  • 3
  • 26
  • 35