0

I am scraping an Applicant Tracking System (BrassRing). I can get logged-in without an issue using Selenium and get the webpage I'm interested in scraping. As I was searching for the table I discovered that the data I want is stored in a jsonGrid.

Everything I can find about Selenium and scraping does not cover how to scrape the contents of a JSON grid.

There are 8 columns in this grid/table that have all of the date beneath them (some cells are empty and that's ok).

As far as I can tell the columns are labeled as follows in the JSON itself though the website displays them slightly differently:

Action Type
Action Date
Action By
Details
Name
emailfrom
emailto
folderid

Here is the first part of the website code that shows the headers and some of the column values.

It would be great if you could provide some information on how I can just scrape the JSON grid/JSON data from the site.

<input type="hidden" name="Grid$jsonData183" id="Grid_jsonData183" class="jsonGridData" value="[{&quot;ActionType&quot;: &quot;Communication - Email&quot;,&quot;ActionDate&quot;: &quot;18-Oct-2019 14:14:25&quot;,&quot;ActionBy&quot;: &quot;Manager, Automation ()&quot;,&quot;Details&quot;: &quot;Status: Sent as to&quot;,&quot;Name&quot;: &quot;&lt;a href=\&quot;#\&quot;/  class=\&quot;ViewCommunication\&quot;&gt;Not Selected&lt;/a&gt;&quot;,&quot;emailfrom&quot;: &quot;Manager, Automation ()&quot;,&quot;emailto&quot;: &quot;Smith, John(john.smith@notreal.com)&quot;,&quot;hm_category&quot;: &quot;5&quot;,&quot;hm_Folderid&quot;: &quot;6537489&quot;,&quot;hm_ReqId&quot;: &quot;-1&quot;,&quot;hm_content&quot;: &quot;1&quot;,&quot;hm_md_communication_type&quot;: &quot;Communication - Email&quot;,&quot;hm_md_communication_correspondenceid&quot;: &quot;1&quot;,&quot;hm_md_communication_correspondenceresumeid&quot;: &quot;46878397&quot;,&quot;hm_pushportal&quot;: &quot;0&quot;,&quot;hm_unpostportal&quot;: &quot;0&quot;,&quot;hm_postportall&quot;: &quot;0&quot;,&quot;hm_PortalExpired&quot;: &quot;0&quot;,&quot;hm_md_communication_agencycodetypeid&quot;: &quot;0&quot;,&quot;hm_md_communication_agencycodeid&quot;: &quot;0&quot;,&quot;hm_md_communication_userid&quot;: &quot;41&quot;,&quot;hm_md_RecipientType&quot;: &quot;4&quot;,&quot;hm_EmailLogId&quot;: &quot;0&quot;,&quot;hm_md_ReceiverUserID&quot;: &quot;0&quot;,&quot;hm_md_fid&quot;: &quot;6537489&quot;,&quot;hm_md_rid&quot;: &quot;6454343&quot;,&quot;hm_md_rftid&quot;: &quot;17&quot;,&quot;hm_md_rsts&quot;: &quot;0&quot;,&quot;hm_md_myfolder&quot;: &quot;0&quot;,&quot;foldername&quot;: &quot;&lt;a href=&#39;#&#39; class=&#39;ViewFolder&#39;&gt;1738995BR:Customer Service Associate II&lt;/a&gt;&quot;,&quot;hm_md_afl&quot;: &quot;0&quot;,&quot;hm_md_rfl&quot;: &quot;1&quot;,&quot;hm_md_rlg&quot;: &quot;en&quot;, &quot;rowmetadata&quot;: &quot;&lt;div&gt;&lt;div name=\&quot;category\&quot; value=\&quot;5\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;folderid\&quot; value=\&quot;6537489\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;reqid\&quot; value=\&quot;-1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;content\&quot; value=\&quot;1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_type\&quot; value=\&quot;Communication+-+Email\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_correspondenceid\&quot; value=\&quot;1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_correspondenceresumeid\&quot; value=\&quot;46878397\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;pushportal\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;unpostportal\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;postportall\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;portalexpired\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_agencycodetypeid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_agencycodeid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_communication_userid\&quot; value=\&quot;41\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_recipienttype\&quot; value=\&quot;4\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;emaillogid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_receiveruserid\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_fid\&quot; value=\&quot;6537489\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rid\&quot; value=\&quot;6454343\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rftid\&quot; value=\&quot;17\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rsts\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_myfolder\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_afl\&quot; value=\&quot;0\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rfl\&quot; value=\&quot;1\&quot;&gt;&lt;/div&gt;&lt;div name=\&quot;md_rlg\&quot; value=\&quot;en\&quot;&gt;&lt;/div&gt;&lt;/div&gt;&quot;},{&quot;ActionType&quot;: &quot;Communication - Email&quot;,&quot;ActionDate&quot;: &quot;18-Oct-2019 13:24:13&quot;,&quot;ActionBy&quot;: &quot;Manager, Automation ()&quot;,&quot;Details&quot;: &quot;Status: Sent as to&quot;,&quot;Name&quot;: &quot;&lt;a href=\&quot;#\&quot;/  class=\&quot;ViewCommunication\&quot;&gt;Not Selected&lt;/a&gt;&quot;,&quot;emailfrom&quot;: &quot;Manager, Automation ()&quot;,&quot;emailto&quot;: &quot;Smith, John(john.smith@notreal.com)&quot;,&quot;hm_category&quot;: &quot;5&quot;,&quot;hm_Folderid&quot;: &quot;6513663&quot;,&quot;hm_ReqId&quot;: &quot;-1&quot;,&quot;hm_content&quot;: &quot;1&quot;,&quot;hm_md_communication_type&quot;: &quot;Communication - Email&quot;,&quot;hm_md_communication_correspondenceid&quot;:
Dharman
  • 30,962
  • 25
  • 85
  • 135
TardisPilot
  • 37
  • 1
  • 9
  • 'jsonGrid' is not some kind of official/standard data storage format. This data looks like html-encode json. To decode, you can use this answer: https://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string – MegaIng Oct 23 '19 at 17:01
  • @Megalng thanks for the think, but I'm not too sure how to interpret what the answer is actually saying and there is added confusion around the multiple versions discussed. Sorry about misunderstanding the jsonGridData. Are you saying I should pull the values based on what they are surrounded by? Extract by the '",&quot' in the data? – TardisPilot Oct 23 '19 at 17:29
  • No, get the string in the `value` argument of the `input` tag. This string that can now be transformed via `html.unescape` to get valid json. This json can now be parsed/used with the help of `json.loads`. Then you have the data that is stored. Now you can look through this data and see what you need – MegaIng Oct 23 '19 at 17:49
  • @Megalng, Using BeautifulSoup to look at the site, and when I print(soup.get_text()) the entire section of the webiste that my above snippet represents is gone and it looks like the site may generate the table via JavaScript. I can't use BeautifulSoup to scrape JS/dynamic tables correct? Do you have a recommended approach? – TardisPilot Oct 23 '19 at 19:19
  • you are using selenium so javascript will run and content will be rendered. No other approach should be necessary. – QHarr Oct 24 '19 at 02:34
  • @QHarr, I see input type, name, id and class in the first part of the code and then value follows. In Selenium I've tried finding it as an id, name, etc. and cannot seem to be able to find the data. Based on the website data I've provided how would you scrape those particular parts in Selenium? I will also post my code tomorrow to show my attempts. – TardisPilot Oct 24 '19 at 03:37

0 Answers0