3

If you have already used Xidel, you will often need to locate nodes that have a certain class. To do this more easy, I want to create has-class("class") function that serves as an alias for the expression:
contains(concat(" ", normalize-space(@class), " "), " class ").

Example:

$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'

e-xidel.sh contains this code:

#!/bin/bash

echo -e "$(tput setaf 2) Checking... $(tput sgr0)"

path=$1
expression=$2

# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'

xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")

echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
Rodrigo Vieira
  • 312
  • 1
  • 4
  • 19
  • Do you absolutely need regex replace? Would a [simple string replace](https://stackoverflow.com/a/13210909/1578604) not work? – Jerry Feb 18 '19 at 12:14

2 Answers2

3

You can use sed (GNU version, cannot guarantee it will work with others implementations) to achieve your need:

sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'

Explanation:

  • s/pattern/substitution/g: replace the portion matching the pattern by the substitution string; g flag for replace all the portions of line (global substitution)
  • has-class("\([^)]\+\)"): a portion starting with has-class(" containing any character except the closing parenthesis ([^)]) and ending by "). Escaped parentheses surrounding the inner part capture the subportion and associate it with the alias \1, since it's the first created capture group.
  • contains(concat(" ", normalize-space(@class), " "), " \1 "): replace the mached portion by this text; \1 will be expanded by the content of the associated captured group.

Your script would be:

#!/bin/bash

function expand-has-class() {
    echo "$1" |
    sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'
}

echo -e "$(tput setaf 2) Checking... $(tput sgr0)"

path=$1
expression="$(expand-has-class "$2")"

# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'

xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")

echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
Amessihel
  • 5,891
  • 3
  • 16
  • 40
1

contains(concat(" ", normalize-space(@class), " "), " class ")

Example:

$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'

This makes no sense.

contains(concat(" ",normalize-space("wp-image")," ")," wp-image ")

would just be the same as

contains("wp-image","wp-image")

If you really want a boolean as output while comparing the value of the class attribute to a literal string then this...

xidel -s example.com -e '//article/p//img/@class="wp-image"'

...would return true or false.

If wp-image is a substring of the class attribute's value:

xidel -s example.com -e '//article/p//img/contains(@class,"wp-image")'
Community
  • 1
  • 1
Reino
  • 3,203
  • 1
  • 13
  • 21