-1

How do I get the value attribute based on a search of some other attribute?

For example:

<body>
<input name="dummy" value="foo">
<input name="alpha" value="bar">
</body>

How do I get the value of the input element with the name "dummy"?

alexanderbird
  • 3,847
  • 1
  • 26
  • 35
Dmitrii G.
  • 895
  • 1
  • 7
  • 21
  • 1
    You can get it with this command. sed -n 's/.*input name="dummy" value="\([^"]*\)".*/\1/p' But for this job, a html/xml parser is the right tool – ramana_k Oct 13 '15 at 17:01

3 Answers3

4

Since you're looking for a solution using bash and sed, I'm assuming you're looking for a Linux command line option.

Use hxselect html parsing tool to extract element; use sed to extract value from element

I did a Google search for "linux bash parse html tool" and came across this: https://unix.stackexchange.com/questions/6389/how-to-parse-hundred-html-source-code-files-in-shell

The accepted answer suggests using the hxselect tool from the html-xml-utils package which extracts elements based on a css selector. So after installing (downoad, unzip, ./configure, make, make install), you can run this command using the given css selector

hxselect "input[name='dummy']" < example.html

(Given that example.html contains your example html from the question.) This will return:

<input name="dummy" value="foo"/>

Almost there. We need to extract the value from that line:

hxselect "input[name='dummy']" < example.html | sed -n -e "s/^.*value=['\"]\(.*\)['\"].*/\1/p"

Which returns "foo".

why you would / would not want to use this approach

Community
  • 1
  • 1
alexanderbird
  • 3,847
  • 1
  • 26
  • 35
  • I came back after the fact and I really don't think this is a good answer - it's awkward and too complicated, and doesn't follow @Ramana's advice because it's still parsing the element attributes with SED. I did some more research and answered again, with a different approach – alexanderbird Oct 21 '15 at 16:55
2

Since you're asking for SED, I'll assume you want a command line option. However, a tool built for html parsing may be more effective. The problem with my first answer is that I don't know of a way in css to select the value of an attribute (does anyone else?). However, with xml you can select attributes like you could other elements. Here is a command line option for using an xml parsing tool.

Treat it as XML; use XPATH

  1. Install xmlstarlet with your package manager
  2. Run xmlstarlet sel -t -v //input[@name=\'dummy\']/@value example.html (where example.html contains your html
  3. If your html isn't valid xml, follow the warnings from xmlstarlet to make the necessary changes (in this case, <input> must be changed to <input/>
  4. Run the command again. Returns: foo

why you might/might not use this approach

Community
  • 1
  • 1
alexanderbird
  • 3,847
  • 1
  • 26
  • 35
1

Parsing HTML with sed is generally a bad idea, since sed works in a line-based manner and HTML does not usually consider newlines syntactically important. It's not good if your HTML-handling tools break when the HTML is reformatted.

Instead, consider using Python, which has an HTML push parser in its standard library. For example:

#!/usr/bin/python

from HTMLParser import HTMLParser
from sys import argv

# Our parser. It inherits the standard HTMLParser that does most of
# the work.
class MyParser(HTMLParser):
    # We just hook into the handling of start tags to extract the
    # attribute
    def handle_starttag(self, tag, attrs):
        # Build a dictionary from the attribute list for easier
        # handling
        attrs_dict = dict(attrs)

        # Then, if the tag matches our criteria
        if tag == 'input' \
           and 'name' in attrs_dict \
           and attrs_dict['name'] == 'dummy':
            # Print the value attribute (or an empty string if it
            # doesn't exist)
            print attrs_dict['value'] if 'value' in attrs_dict else ""

# After we defined the parser, all that's left is to use it. So,
# build one:
p = MyParser()

# And feed a file to it (here: the first command line argument)
with open(argv[1], 'rb') as f:
    p.feed(f.read())

Save this code as, say, foo.py, then run

python foo.py foo.html

where foo.html is your HTML file.

Wintermute
  • 42,983
  • 5
  • 77
  • 80