0

I am trying to run the following command

echo `grep -o "<\/div><div class\=\".*" $1` |
grep -o "title=\\"\(.*\?\)\\" aria-describedby" -> title.txt

from script test.sh.

However, every time I check my file title.txt, it is empty.

I tested the first part of the command,

echo `grep -o "<\/div><div class\=\".*" $1`

and it works fine.

The second part is the one with the problem"

grep -o "title=\\\"\(.*\?\)\\\" aria-describedby" -> title.txt

Just to keep in mind, this is not being run from the terminal itself, but from a bash script file being called from the terminal.

I believe my problem lies in how I am quoting or escaping the quotes.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
Andy
  • 349
  • 1
  • 2
  • 8
  • 1
    `->` ?? That's not proper syntax. Please use http://shellcheck.net before posting more Qs here on StackOverflow. Good luck. – shellter Feb 08 '17 at 21:11
  • Great tool thanks. – Andy Feb 08 '17 at 21:12
  • 1
    `->` is the same as `- >` and seems to be correctly (if redundantly) used to have grep search stdin and redirect to a file. – that other guy Feb 08 '17 at 21:14
  • @ThatOtherGuy . Yes, but not a good habit to form for an beginning user. Good luck to all. – shellter Feb 08 '17 at 21:19
  • @shellter It is confusing, but to be fair, shellcheck doesn't complain about it. – Benjamin W. Feb 08 '17 at 21:21
  • Guys I solved it. I had to back quote the entire command. '` cmd | cmd `' – Andy Feb 08 '17 at 21:22
  • Thanks for the help, and the new tool. – Andy Feb 08 '17 at 21:22
  • 4
    @Andy That's actually not the best solution. You should _drop_ the `echo` and all backticks instead. – Benjamin W. Feb 08 '17 at 21:23
  • Can you explain please? – Andy Feb 08 '17 at 21:23
  • 2
    @Andy, why does your command have an `echo` at all? `grep` already writes to stdout; you don't need an echo to do that. If you take out the `echo`, and also take out the backticks, then you let your `grep`s write directly to stdout without inserting extra work (and extra bugs). – Charles Duffy Feb 08 '17 at 21:24
  • 2
    I'm glad it's working for you! However, you should be aware that `grep` is [not a good way](http://stackoverflow.com/a/1732454/1899640) to extract information from HTML , and that there are tools like htmltidy+xmlstarlet that can do it easily and robustly. – that other guy Feb 08 '17 at 21:25
  • @BenjaminW. : OK, I'll take that as "I learned something extra today" :-) ! Good for shellcheck, as the code is "legal", that gives me an even higher respect for its creators. I do think using `->` in shell just muddies the waters about what is happening, especially for people that have been exposed to other languages that use `->` as a true syntax element. Good luck to all! – shellter Feb 08 '17 at 21:25
  • @Andy, ...to give an example of what I mean by "extra bugs" -- if what grep finds has a `*` in it, then ```echo `grep ...` ``` will emit a list of filenames in your local directory, because you didn't quote correctly. By contrast, if you don't have the command substitution (the backtick syntax), then you don't have the side effects of using that syntax unquoted (string-splitting and glob expansion). – Charles Duffy Feb 08 '17 at 21:26
  • @Charles Duffy, I will try your suggestion. Thanks! – Andy Feb 08 '17 at 21:32
  • @the other guy, I did not know that. I will try and use the tools you have provided. Thank you. – Andy Feb 08 '17 at 21:33

1 Answers1

0

I do not know if your expressions do what you want them to do, but there is an issue with this one :

"title=\\"\(.*\?\)\\"

When the shell sees to consecutive backslashes (basically an escaped backslash), it will read them as one literal backslash. The first twin backslashes in your expression are read like this, and the double quote that follows ends the string. In other words, the following is a string :

"title=\\"

And the rest of the line :

\(.*\?\)\\"

ends with a double quote (not escaped once again due to the twin backslashes that become one literal backslash), but has no initial double quote.

Fred
  • 6,590
  • 9
  • 20
  • Correct yes, I think that is the original problem, in that case how can I fix it Fred? – Andy Feb 08 '17 at 21:26
  • @Andy Since I do not know what you are trying to achieve, and what your input data is, it is difficult to provide guidance. Your expression is not valid, but even if I recommend a valid one, it does not mean it finds what you are searching for. – Fred Feb 08 '17 at 21:30
  • Got it, thank you Fred. I have solved the issue. I will still try to improve the script. My objective is to extract the title of all the videos given in the following url "www.youtube.com/index". – Andy Feb 08 '17 at 21:36