1

I would like to write a script which would find some specific datas in a web page and return it in a pop up box

The code below works perfectly for a given string of characters. The issue is that each time a new product its checked, the string will change.

This is how it would look when checking the page source:

<randomcharacters<!---->evenmorerandomcharacters<!----> 9999 <!----></div>

There will always be 2 <!----> before the number I want to grab and <!----></div> after the number I want to grab. The number of random characters before the number I want to grab is not consistent either

tell application "Safari"

    set unitsgrab to do JavaScript "document.getElementsByClassName('theclassIwant')[0].innerHTML;" in current tab of window 1
end tell

set units to ""
set theSource to unitsgrab
property leftEdge : "randomcharacters<!---->evenmorerandomcharacters<!---->"
property rightEdge : "<!----></div>"
try
    set saveTID to text item delimiters
    set text item delimiters to leftEdge
    set classValue to text item 2 of theSource
    set text item delimiters to rightEdge
    set units to text item 1 of classValue
    set text item delimiters to saveTID
    units
end try



display dialog "Units:" & (units)

What I actually want to do is tell the script to delete everything up to the second <!----> in order to only have 9999 display in my example above

  • 1
    Given your example code If you replace all the lines of code from `set units to ""` up to and including `end try` with the following one line: `set units to do shell script "/usr/bin/awk 'BEGIN { FS = \"\" } ; { print $3 }' <<< " & quoted form of unitsgrab & " | xargs"` that will give you `9999`. This shells-out to `awk` and the last pipe to `xargs` removes any leading/trailing space characters. – RobC Sep 05 '19 at 16:12
  • 1
    @RobC, Why use `& " | xargs"` when `awk` can do it all by adding `gsub(/ /, \"\", $3) ;` in front of `print $3` thus making the extra pipe and additional external command needless? – user3439894 Sep 06 '19 at 01:58
  • 2
    `set units to do shell script "awk -F \"\" '{ gsub(/ /, \"\", $3); print $3 }' <<< " & quoted form of unitsgrab` – user3439894 Sep 06 '19 at 03:11
  • 1
    @user3439894 - Simply because I wrote that comment via my mobile phone (i.e. no computer to test on) and I was unsure of the syntax. But yes, utilizing `awk` to also remove the _leading and trailing_ white space is another option. However, please note that your example strips all white space - including any interior space(s). It seems that [two successive `gsub` commands](https://stackoverflow.com/questions/20600982/trim-leading-and-trailing-spaces-from-a-string-in-awk#answer-20601021) are required for stripping leading and trailing spaces _only_. – RobC Sep 06 '19 at 08:00

2 Answers2

2

Assuming you've represented to data correctly, I don't think you need to worry about the random characters. Rewrite your text item delimiters routine like so:

set tid to my text item delimiters
set my text item delimiters to "<!---->"
set classValue to text item 3 of theSource
set my text item delimiters to tid

text item 3 should always be the text between the 2 and 3 occurrences of the delimiter string.

Ted Wrigley
  • 2,921
  • 2
  • 7
  • 17
1

You can utilize AppleScripts do shell script command to shell out to awk. Here are a couple of examples:


  1. Example A : Exclude all spaces

    This example is as per @user3439894's suggestion, (thankyou @user3439894 !), which improves upon the example given in my earlier comment. This avoids piping to xargs and instead strips spaces via awk too.

    tell application "Safari"
      set unitsgrab to do JavaScript "document.getElementsByClassName('theclassIwant')[0].innerHTML;" in current tab of window 1
    end tell
    
    set units to do shell script "awk -F \"<!---->\" '{ gsub(/ /, \"\", $3); print $3 }' <<< " & quoted form of unitsgrab
    
    display dialog "Units:" & units
    

    However, this example does strip all leading, trailing, and interior space(s). For instance, lets say the string assigned to the unitsgrab variable is:

    <rand56omcharacters<!---->evenmorera11ndomcharacters<!---->  99 9 9 <!---->
                                                               ^^  ^ ^ ^
    

    Note the additional spaces indicated by the caret symbols (^).

    The resultant value assigned to the units variable will be:

    9999
    
  2. Example B : Exclude leading and trailing whitespace only

    The following example removes leading/trailing whitespace, and preserves any interior whitespace:

    tell application "Safari"
      set unitsgrab to do JavaScript "document.getElementsByClassName('theclassIwant')[0].innerHTML;" in current tab of window 1
    end tell
    
    set units to do shell script "awk -F \"<!---->\" '{ gsub(/^[ \\t]+/, \"\", $3); gsub(/[ \\t]+$/, \"\",$3); print $3 }' <<< " & quoted form of unitsgrab
    
    display dialog "Units:" & units
    

    This time, lets say the string assigned to the unitsgrab variable is:

    <rand56omcharacters<!---->evenmorera11ndomcharacters<!---->  12 3  4 <!----></div>    
                                                               ^^  ^ ^^ ^
    

    Again note the additional spaces indicated by the caret symbols (^).

    The resultant value assigned to the units variable will be:

    12 3  4
      ^ ^^
    

    Note the interior whitespace(s) has been preserved, only the leading and trailing whitespace has been removed. (The caret symbols exist here for illustrative purposes only)


To better understand the awk commands above I recommend reading this answer. The notable differences here is that some additional character escaping (i.e. using a backslash \) is required in these AppleScript examples to ensure valid syntax. For instance; double quotes " become \" and \t become \\t.


EDIT:

  1. Example C : Preserve all whitespace

    If you wanted to preserve all leading, trailing, and inner whitespace then omit the gsub part. For instance:

    tell application "Safari"
      set unitsgrab to do JavaScript "document.getElementsByClassName('theclassIwant')[0].innerHTML;" in current tab of window 1
    end tell
    
    set units to do shell script "awk -F \"<!---->\" '{ print $3 }' <<< " & quoted form of unitsgrab
    
    display dialog "Units:" & units
    
RobC
  • 22,977
  • 20
  • 73
  • 80
  • 1
    In 2. **Example B :** **_Exclude leading and trailing whitespace only_**, you do not necessarily need to use two separate `gsub` _commands_, as one with the following `regex` handles it: `awk -F '' '{gsub(/^[ \t]+|[ \t]+$/, \"\", $3); print $3}'` – user3439894 Sep 06 '19 at 11:14
  • @user3439894 - Thanks for your commentary/suggestions. Yes, I saw the use of the single `gsub` in [the same thread](https://stackoverflow.com/questions/20600982/trim-leading-and-trailing-spaces-from-a-string-in-awk#answer-20601998) that I linked to too. As the adage says: _"There's more than one way to cook an egg.."_ `:)` – RobC Sep 06 '19 at 11:37