1

I am trying to scrape formularylookup.com, a site with information on the market for pharmaceuticals.

It requires a login: username: - password: -

I need the information for the medicine called Rybelsus.

When I look into the Inspect-> Network -> XHR I suspect there could be an easy way to get the required data form this page:

https://formularylookup.com/Formulary/Coverage?ProductId=237171&ProductName=Rybelsus&ChannelId=1&DrugTypeId=3&StateId=all&Options=SummaryCoverages

I identified this site, which might give an idea of how to connect to formularylookup.com, but I am very inexperienced with connecting to API's.

Here's my code:

import requests
from bs4 import BeautifulSoup


url ="https://api.mmitnetwork.com/Formulary/v1/Products?Name=rybelsus"
params = {
        "ProductId":"237171",
        "productSearch":"Rybelsus"}


headers = {
        "authorization":"Bearer H-oa4ULGls2Cpu8U6hX4myixRoFIPxfj",
        "Access-Token":"H-oa4ULGls2Cpu8U6hX4myixRoFIPxfj",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest",
        "Host": "formularylookup.com",
        "X-NewRelic-ID": "XAYCVFZSGwcGU1lXBAI="
        }


res = requests.get(url ,params=params ,headers = headers)
soup = BeautifulSoup(res.content, "lxml")
print(soup.prettify())

Which gives me the following response:

<!DOCTYPE html>
<html>
 <head>
  <title>
   The resource cannot be found.
  </title>
  <meta content="width=device-width" name="viewport"/>
  <style>
   body {font-family:"Verdana";font-weight:normal;font-size: .7em;color:black;} 
         p {font-family:"Verdana";font-weight:normal;color:black;margin-top: -5px}
         b {font-family:"Verdana";font-weight:bold;color:black;margin-top: -5px}
         H1 { font-family:"Verdana";font-weight:normal;font-size:18pt;color:red }
         H2 { font-family:"Verdana";font-weight:normal;font-size:14pt;color:maroon }
         pre {font-family:"Consolas","Lucida Console",Monospace;font-size:11pt;margin:0;padding:0.5em;line-height:14pt}
         .marker {font-weight: bold; color: black;text-decoration: none;}
         .version {color: gray;}
         .error {margin-bottom: 10px;}
         .expandable { text-decoration:underline; font-weight:bold; color:navy; cursor:hand; }
         @media screen and (max-width: 639px) {
          pre { width: 440px; overflow: auto; white-space: pre-wrap; word-wrap: break-word; }
         }
         @media screen and (max-width: 479px) {
          pre { width: 280px; }
         }
  </style>
 </head>
 <body bgcolor="white">
  <span>
   <h1>
    Server Error in '/' Application.
    <hr color="silver" size="1" width="100%"/>
   </h1>
   <h2>
    <i>
     The resource cannot be found.
    </i>
   </h2>
  </span>
  <font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">
   <b>
    Description:
   </b>
   HTTP 404. The resource you are looking for (or one of its dependencies) could have been removed, had its name changed, or is temporarily unavailable.  Please review the following URL and make sure that it is spelled correctly.
   <br/>
   <br/>
   <b>
    Requested URL:
   </b>
   /Formulary/v1/Products
   <br/>
   <br/>
   <hr color="silver" size="1" width="100%"/>
   <b>
    Version Information:
   </b>
   Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.6.1590.0
  </font>
 </body>
</html>
<!-- 
[HttpException]: The controller for path &#39;/Formulary/v1/Products&#39; was not found or does not implement IController.
   at System.Web.Mvc.DefaultControllerFactory.GetControllerInstance(RequestContext requestContext, Type controllerType)
   at System.Web.Mvc.DefaultControllerFactory.CreateController(RequestContext requestContext, String controllerName)
   at System.Web.Mvc.MvcHandler.ProcessRequestInit(HttpContextBase httpContext, IController& controller, IControllerFactory& factory)
   at System.Web.Mvc.MvcHandler.BeginProcessRequest(HttpContextBase httpContext, AsyncCallback callback, Object state)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
-->
<!-- 
This error page might contain sensitive information because ASP.NET is configured to show verbose error messages using &lt;customErrors mode="Off"/&gt;. Consider using &lt;customErrors mode="On"/&gt; or &lt;customErrors mode="RemoteOnly"/&gt; in production environments.-->

Update: I get an 404 error. Not sure why.

doomdaam
  • 691
  • 1
  • 6
  • 21

1 Answers1

2

Below code will help you,

import requests

headers = {
    'Accept': '*/*',
    'X-Requested-With': 'XMLHttpRequest',
    'Access-Token': '7Lq-KkDx2fCO_3kG90pLEpBS9Ssh62IQ',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36',
    'Is-Session-Expired': 'false',
    'Referer': 'https://formularylookup.com/',
}

response = requests.get('https://formularylookup.com/Formulary/Coverage?ProductId=237171&ProductName=Rybelsus&ChannelId=1&DrugTypeId=3&StateId=AL&Options=SummaryCoverages', headers=headers)

print(response.json())

Note: 'Is-Session-Expired': 'false' is very important in the header otherwise you'll get 404 error.

See it in action here

CodeIt
  • 3,492
  • 3
  • 26
  • 37
  • I think I've been working on this problem for two full working days. Initially I tried the API, but couldn't make it work. Then I tried a huge ass code with selenium and BS4, but eventually got stuck, so I turned to the API again. Now I have a better idea of how it works. Thank you! Now to turn this into a pandas dataframe, should be easy. – doomdaam Feb 01 '20 at 15:24
  • Thank you so much! I swear I always underestimate how helpful some people on SO are :'( – doomdaam Feb 01 '20 at 15:29
  • 1
    @doomdaam Glad to hear! – CodeIt Feb 01 '20 at 15:30
  • I posted a new question after playing around if you're interested: https://stackoverflow.com/questions/60019043/convert-json-nested-list-of-dict-to-dataframe – doomdaam Feb 01 '20 at 16:49