0

I have a curl request, which returns following output:

<a href="spike10-st-d43d7eff66aa.ovpn">pike10-st-d43d7eff66aa.ovpn</a>                 25-Sep-2018 13:49                4947
<a href="spike11-First-d43d7eff66aa.ovpn">spike11-First-d43d7eff66aa.ovpn</a>                 25-Sep-2018 14:04                4951
<a href="spike12-rst-d43d7eff66aa.ovpn">spike12-rst-d43d7eff66aa.ovpn</a>                 25-Sep-2018 14:27                4947
<a href="spike13-irst-d43d7eff66aa.ovpn">spike13-irst-d43d7eff66aa.ovpn</a>                 25-Sep-2018 15:00                4947

Can anyone give me a hint, how to remove all outside quotation marks to receive only names of *.ovpn files, like this:

spike10-st-d43d7eff66aa.ovpn
spike11-First-d43d7eff66aa.ovpn
spike12-rst-d43d7eff66aa.ovpn
spike13-irst-d43d7eff66aa.ovpn
Cyrus
  • 84,225
  • 14
  • 89
  • 153
Vasyl Stepulo
  • 1,493
  • 1
  • 23
  • 43
  • 4
    Use an HTML parser like `xmllint` or `xmlstarlet` to extract the value of the `href` attribute of each `a` element. – chepner Oct 03 '18 at 16:14
  • 1
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) – Cyrus Oct 03 '18 at 18:50
  • You ask for the href fields, not the description. Your example is slightly confusing. – Walter A Oct 03 '18 at 20:59

5 Answers5

4

If the input won't contain any extra quotation marks, you can just use cut

cut -d\" -f2 filename

This will delimit on quotation marks, and get the 2nd field. Simple.

Sunny Patel
  • 7,830
  • 2
  • 31
  • 46
4

Get value of attribute href with a valid HTML file:

xmlstarlet select --text --template --value-of '//a/@href' -n file.html

Output:

pike10-st-d43d7eff66aa.ovpn
spike11-First-d43d7eff66aa.ovpn
spike12-rst-d43d7eff66aa.ovpn
spike13-irst-d43d7eff66aa.ovpn

See: xmlstarlet select --help

Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • 1
    @RavinderSingh13: Thanks for the question, but unfortunately I can't accommodate that in time. Yes, the documentation could be much better. – Cyrus Oct 04 '18 at 04:38
  • 2
    @RavinderSingh13 while I agree that xmlstarlet's documentation is rather sparse, I do believe that 90% of its complexity and misunderstanding comes from XPath which is far from logical. If you understand the latter, it becomes much more interesting. But to my opinion, mastering Xpath comes close to trying to boil an ocean! – kvantour Oct 04 '18 at 14:09
2

You can use the following to remove anything outside quotation marks:

awk -F\" '{print $2}' file

spike10-st-d43d7eff66aa.ovpn
spike11-First-d43d7eff66aa.ovpn
spike12-rst-d43d7eff66aa.ovpn
spike13-irst-d43d7eff66aa.ovpn
Grant Miller
  • 27,532
  • 16
  • 147
  • 165
Claes Wikner
  • 1,457
  • 1
  • 9
  • 8
  • Please explain what your code is trying to do, not just lines of code. – Eriawan Kusumawardhono Oct 03 '18 at 17:03
  • 1
    @EriawanKusumawardhono nonsense. Explaining that code would be as useful as adding an `incrementing i` comment to `i++`. – Ed Morton Oct 03 '18 at 21:19
  • 1
    @EdMorton feel free to disagree. I understand what the code does, but I'm also following SO guidelines: https://stackoverflow.com/help/how-to-answer To me, adding the remark on what the code does will be helpful for others, especially newbie. – Eriawan Kusumawardhono Oct 04 '18 at 07:29
  • @EriawanKusumawardhono, while you're not wrong, the code in this answer is so blatantly obvious that I'd expect anyone who included an [tag:awk] tag in their question to be able to read and understand it. The code in this answer is *infinitely* longer and more complex than the awk code in the question .. which is a criticism of the question (which should perhaps be closed) rather than of the answer. – ghoti Oct 07 '18 at 04:26
1

Could you please try following(considering that your actual Input_file is same as show samples).

awk 'match($0,/href="[^"]*/){print substr($0,RSTART+6,RLENGTH-6)}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

this regular expression help you to remove unwanted string.

.replace(/(.*)(["])(.*)(["])(.*)/g, '$3')


'<a href="spike10-st-d43d7eff66aa.ovpn">pike10-st-d43d7eff66aa.ovpn</a>                 25-Sep-2018 13:49                4947'.replace(/(.*)(["])(.*)(["])(.*)/g, '$3')