0

I copy the HTML of a "multi-select" list from a page which looks like that: enter image description here and then paste the HTML version (after beautifying it online) in a notepad++ page.

I know want to use Regex in order to extract the lines that are enabled in that list. In other words, I want to see what options I had selected from that dropdown. There are many lines and it is impossible to scroll and find them all. So, the best way in my mind is to use that HTML and search for the divs that contain "enabled". Then, the inner divs should have the values that I am looking for.

The HTML is shown below:

       <div class="ui-multiselect-option-row" data-value="1221221111">
      <div class="ui-multiselect-checkbox-wrapper">
         <div class="ui-multiselect-checkbox"></div>
      </div>
      <div class="ui-multiselect-option-row-text">(BASE) OneOneOne (4222512512)</div>
   </div>
   <div class="ui-multiselect-option-row ui-multiselect-option-row-selected" data-value="343333434334">
      <div class="ui-multiselect-checkbox-wrapper">
         <div class="ui-multiselect-checkbox"></div>
         <div class="ui-multiselect-checkbox-selected">✔</div>
      </div>
      <div class="ui-multiselect-option-row-text">(BASE) TwoTwoTwo (5684641230)</div>
   </div>

The outcome should return the following value only (based on the above): (BASE) TwoTwoTwo (5684641230)

So far, I have tried using the following regex in notepad++:

<div class="ui-multiselect-option-row ui-multiselect-option-row-selected"(.*?)(?=<div class="ui-multiselect-option-row")

but it is impossible to mark all the lines at the same time and remove the unmarked ones. Notepad++ only marks the first line of the entire selection. So, I am thinking whether there is a better way - a more complex regex that can parse the value directly. So, in lines:

a) I either want to make the above work with another regex line in notepad++ (I am open to visualstudio if that makes it faster)

b) Or an easier way using the console in Chrome to parse the selected values. I would still like to see the regex solution but for Chrome console I have an

Update 1:

I used this line $('div.ui-multiselect-option-row-selected > div:nth-child(2)') and all I need know, as I am not that familiar with the Chrome console export, is to get the innerHTML from the following lines: enter image description here

Update 2:

for (var b in $('div.ui-multiselect-option-row-selected > div:nth-child(2)')){
    console.log($('div.ui-multiselect-option-row-selected > div:nth-child(2)')[b].innerHTML);

which works and I now only have to export the outcome }

Datacrawler
  • 2,780
  • 8
  • 46
  • 100
  • have you considered using an actual html parser? [Regex is not a tool for parsing HTML](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Chase Jan 12 '21 at 17:40
  • I cannot parse the entire page as it has too many elements. I only want to focus on that specific dropdown. I am trying to see if I can get the selected values using chrome console. I know what the name of the class is as mentioned above, but I am not familiar with the process. Alternatively, I can use the Regex option. – Datacrawler Jan 12 '21 at 17:42
  • What's stopping the html parser from parsing only the elements you need to track? – Chase Jan 12 '21 at 17:43
  • Also you need to show what the div that has `enabled` looks like. Ideally, post a minimal, yet complete example of the elements you need to track. – Chase Jan 12 '21 at 17:44
  • It is the parent ```div``` that contains ```'.ui-multiselect-option-row-selected'``` and then I want to parse the text from the second child ```div```. I think the HTML parser would take longer than using the console or the Regex version? – Datacrawler Jan 12 '21 at 17:48
  • Try using css selectors - `div.ui-multiselect-option-row-selected > div:nth-child(2)`. That will choose the **immediate 2nd child** (one level of nesting) div within a div that has the `ui-multiselect-option-row-selected` class. What I am trying to tell you, however, is that it is fundamentally impossible to parse html with regex. HTML parsers, followed with CSS selectors is the way to go in this case. Chrome (or any other web browser) already parses the HTML, and lets you search elements using CSS selectors – Chase Jan 12 '21 at 17:52
  • I have used that in Chrome and now I need to parse the selected options and export them to a list. That is where I am stuck regarding the Chrome. If regex looks impossible we can focus on that now. I like this challenged. – Datacrawler Jan 12 '21 at 18:03

2 Answers2

2

Open up Chrome's Console tab and execute this:

$x('//div[contains(@class, "ui-multiselect-option-row-selected")]/div[@class="ui-multiselect-option-row-text"]/text()')

Here is how it should look using your limited HTML sample but duplicated.

enter image description here

If you have multiple multi-selects and no unique identifier then count which one you need to target (notice the [1]):

    $x('//div[contains(@class, "ui-multiselect-option-row-selected")][1]/div[@class="ui-multiselect-option-row-text"]/text()')
MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • Will that return multiple values? I have tried mine version which didn't work: ```for (var b in $('div.ui-multiselect-option-row-selected > div:nth-child(2)')){ if(window.hasOwnProperty(b)) console.log(b.innerHTML); }``` – Datacrawler Jan 12 '21 at 18:08
  • That returned the first value only. And note that there was another dropdown in the page. – Datacrawler Jan 12 '21 at 18:10
  • @Datacrawler Okay, is there a differentiator for the other dropdown? – MonkeyZeus Jan 12 '21 at 18:12
  • I have tried this and it worked. All I need to do now is just to export this list :) Please see below: ```for (var b in $('div.ui-multiselect-option-row-selected > div:nth-child(2)')){ console.log($('div.ui-multiselect-option-row-selected > div:nth-child(2)')[b].innerHTML); }``` – Datacrawler Jan 12 '21 at 18:17
  • So, they both work. I think I will stick to my solution as it returns the results. I only have to export this directly to a csv. And possible add a checked based on the .length – Datacrawler Jan 12 '21 at 18:23
  • 1
    @Datacrawler You can use JS to turn your string into a DOM object and apply the appropriate XPath, this would also let you control exactly how you wish to output the data. – MonkeyZeus Jan 12 '21 at 19:06
  • @Datacrawler If you're interested in knowing how to bring the HTML into JS then see my answer here: https://stackoverflow.com/a/65290030/2191572 – MonkeyZeus Jan 12 '21 at 20:39
  • Can you please vote to [undelete this answer](https://stackoverflow.com/q/66013555/548225) – anubhava Feb 12 '21 at 18:40
0

All you have to do is use css selectors followed by a .map to get all the elements' innerHTML in a list

[...$('div.ui-multiselect-option-row-selected > div:nth-child(2)')].map(n => n.innerHTML)

The css selector is div.ui-multiselect-option-row-selected > div:nth-child(2) - which, as I've already mentioned in my comment, selects the 2nd immediate child of all divs with the ui-multiselect-option-row-selected class.

Then we just use some javascript to turn the result into a list and do a map to extract all the innerHTML. As you asked.

If the list is sufficiently big, you might consider storing the result of [...$('div.ui-multiselect-option-row-selected > div:nth-child(2)')].map(n => n.innerHTML) in a variable using

const e = [...$('div.ui-multiselect-option-row-selected > div:nth-child(2)')].map(n => n.innerHTML);

and then doing

copy(e);

This will copy the list into your clipboard, wherever you use ctrl + v now - you'll end up pasting the list.

Chase
  • 5,315
  • 2
  • 15
  • 41
  • It seems I cannot add a variable in my console. It always says undefined. – Datacrawler Jan 12 '21 at 18:34
  • @Datacrawler No, the variable has been added - the result of setting a variable is `undefined` in javascript. You can access this variable by typing in the variable name and pressing enter. – Chase Jan 12 '21 at 18:39
  • I have made it work but my only problem is that I cannot export it to a solid csv or json file. I can only save the entire console at chrome. My code: ```let myList = []; for (var b in $$('div.ui-multiselect-option-row-selected > div:nth-child(2)')){ myList.push($$('div.ui-multiselect-option-row-selected > div:nth-child(2)')[b].innerHTML); }; console.log(myList);``` I can use copy and the save it. I just wonder whether I can do that directly. – Datacrawler Jan 12 '21 at 18:42