How can I remove the beginning of a word using grep? For example, I have a file that contains this:
www.abc.com
I only need the this part:
abc.com
Sorry for the basic question, but I have no experience with Linux.
How can I remove the beginning of a word using grep? For example, I have a file that contains this:
www.abc.com
I only need the this part:
abc.com
Sorry for the basic question, but I have no experience with Linux.
You don't edit strings with grep
in Unix shell, grep
is usually used to find or remove some lines from the text. You'd rather use sed
instead:
$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com
You'll need to learn regular expressions to use it effectively.
Sed can also edit file in-place (modify the file), if you pass -i
argument, but be careful, you can easily lose data if you write the wrong sed
command and use -i
flag.
From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex
:
\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}
then you can transform it with this sed
command (redirect output to file or edit in-place with -i
):
$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)/\2/gi' test.tex
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}
Please note that:
[a-z0-9-]\+\.
\(
and \)
) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2
in the substitution pattern)\+
repition means at least one match)i
flag in the end)g
flag in the end)As the others have noted, grep
is not well suited for this task, sed
is a good option, or if the text is well ordered a simple cut
might be easier to type:
echo www.abc.com | cut -d. -f2-
-d.
tells cut
to use .
as a delimiter.-f2-
tells cut
to return field 2 to infinity.You can do this using grep
easily:
$ echo www.google.com | grep -o '[^.]*\.com'
google.com
Instead of echo
you must give your file.
$ grep -o '[^.]*\.com$' < file
I used here the regular expression '[^.]*.com'. That means: find me a word without .
in it ([^.]*
), after which goes .com
(\.com
in re). The -o
key says that grep
must show only that part that was found.
--only-matching
and \K
You can do this with grep's --only-matching
option:
echo 'www.abc.com' | grep --perl-regexp --only-matching 'www\.\K.*'
which can be shortened to
echo 'www.abc.com' | grep -Po 'www\.\K.*'
Both commands produce
abc.com
with grep (GNU grep) 3.3.
Instead of echo
, I'll use a here string to shorten the command further:
grep -Po 'www\.\K.*' <<< 'www.abc.com'
\K
resets the starting point of the match, essentially forgetting the matched "www.". See this for more on \K
.
You can also do this with a positive lookbehind:
grep -Po '(?<=www\.).*' <<< 'www.abc.com'
-F
awk -F 'www\\.' '$2{print $2}' <<< 'www.abc.com'
This prints
abc.com
The $2{print $2}
part will print the second field if it's defined. This is necessary in case of multi-line input to avoid outputting blank lines for input lines that don't contain the field separator.
sed --regexp-extended --quiet 's/www\.(.*)/\1/p' <<< 'www.abc.com'
The parentheses form a group which will capture everything after "www.". Using \1
we reference that group and /p
prints it.
The options --regexp-extended
and --quiet
have the shorter equivalents -E
and -n
:
sed -E -n 's/www\.(.*)/\1/p' <<< 'www.abc.com'
As noted by Vladimir Nesterenco in a deleted answer, it's advisable to escape the dot with a backslash in all these regexes, to avoid matching strings that start with "www" followed by an arbitrary character, not only a dot. Otherwise, you'd extract "abc.com" from "wwwXabc.com", for example.
Depending on your input text, you might want to change the regex to make sure to only match occurrences of "www." at the beginning of a line:
^www\.
If your input consists only of a single line, Bash' built-in parameter expansion might be useful:
input="www.abc.com"; after=${input#"www."}; echo "$after"
If the input string doesn't start with "www.", this will print the whole string.
grep
is not used to manipulate/change text, only to search for text/patterns within text
You should look into something like sed
or awk
or cut
if you want a command line tool to do it. Or write a script in Python/Perl/Ruby/whatever.
You can actually do this without invoking other programs, by using a builtin parameter expansion in bash:
while read line; do echo ${line#*.}; done < file
Where #*.
tells the shell to remove the prefix that looks like 0 or more characters followed by a .
.
You can view a cheatsheet with the different parameter expansions for bash here: