0

I need to get part of a json response

Part of my code:

r = scraper.get('https://nsa.gob.ye/ha/api/scar-doc/01/09090909/', json=payload, headers=headers, cookies=cookies)

Part of the Response print(r.text):

<div class="request-info" style="clear: both" aria-label="request info">
                <pre class="prettyprint"><b>GET</b> /ha/api/scar-doc/01/09090909/</pre>
              </div>

              <div class="response-info" aria-label="response info">
                <pre class="prettyprint"><span class="meta nocode"><b>HTTP 200 OK</b>
<b>Allow:</b> <span class="lit">GET, HEAD, OPTIONS</span>
<b>Content-Type:</b> <span class="lit">application/json</span>
<b>Vary:</b> <span class="lit">Accept</span>

</span>{
    'datos': {
        'data': {
            'tipo_documento': '01',
            'numero_documento': '09090909',
            'apellido_paterno': 'SHREK',
            'apellido_materno': 'SHREK',
            'nombres': 'SHREK',
            'edad_anios': 111,
            'str_fecha_nacimiento': '00/00/0000'
        },
        'resultado': 'Enc'
    }
}</pre>
              </div>
            </div>

I need to get 'str_fecha_nacimiento' content using beautifulsoup. Thanks

AjaxPain
  • 9
  • 3

1 Answers1

0

The problem I saw was the JSON is in plain text inside an incomplete HTML code.

So, I try by splitting the code inside the div element and then, get only the JSON data - by discarding the first lines.

Here is the code:

sample_data = """
<div class="request-info" style="clear: both" aria-label="request info">
   <pre class="prettyprint"><b>GET</b> /ha/api/scar-doc/01/09090909/</pre>
</div>
<div class="response-info" aria-label="response info">
   <pre class="prettyprint"><span class="meta nocode"><b>HTTP 200 OK</b>
<b>Allow:</b> <span class="lit">GET, HEAD, OPTIONS</span>
<b>Content-Type:</b> <span class="lit">application/json</span>
<b>Vary:</b> <span class="lit">Accept</span>

</span>{
    'datos': {
        'data': {
            'tipo_documento': '01',
            'numero_documento': '09090909',
            'apellido_paterno': 'SHREK',
            'apellido_materno': 'SHREK',
            'nombres': 'SHREK',
            'edad_anios': 111,
            'str_fecha_nacimiento': '00/00/0000'
        },
        'resultado': 'Enc'
    }
}</pre>
</div>
</div>
"""

# Get the soup: 
soup = BeautifulSoup(sample_data, "html.parser")

# Get only the JSON data - that is, by discarding the elements before the 6th line
# The data here is split by the line-break "\n" and then joined again in a single string:
js_data = "\n".join(soup.find("div", class_="response-info").get_text().split("\n")[6:])

# Print the JSON data obtained: 
print(js_data)

Result:

{
    'datos': {
        'data': {
            'tipo_documento': '01',
            'numero_documento': '09090909',
            'apellido_paterno': 'SHREK',
            'apellido_materno': 'SHREK',
            'nombres': 'SHREK',
            'edad_anios': 111,
            'str_fecha_nacimiento': '00/00/0000'
        },
        'resultado': 'Enc'
    }
}

Notice that, after applying the code shown in this answer, you can get the actual JSON data:

Code:

import ast
json_data = ast.literal_eval(json.dumps(js_data))
print(json_data)
  • Thanks bro, you helped me a lot. But I have one more question, what happens is that I have a json response that contains values in "null", and when I execute the following "json_data = ast.literal_eval((js_data))" it throws me the following error ValueError: malformed node or string on line 9: ", line 9 would be where the data is found in null – AjaxPain Jan 12 '23 at 20:49
  • @AjaxPain in that case, I believe the json is not valid. You have to modify your code for handle those cases. Start by printing the value of `js_data` when an error occurs and see its value. – Marco Aurelio Fernandez Reyes Jan 12 '23 at 21:00
  • Gracias estimado. – AjaxPain Jan 13 '23 at 12:18