1

I have this HTML and I went to toolslick.com to convert it to JSON. So, I got this JSON and I would like to know if is it possible to do exactly as it is, but in Python. What could I use? Regex? Some library? A loop? I tried some things but unsuccessfully. It doesn't need to be in JSON but I thought it was the best since I can acess the values using ['tr'][0] for example. Thank you.

HTML:
<tr>
    <td>
        <span class="theme1">1</span> Charisma
    </td>
    <td>
        <span class="theme1">1</span> Smartness
    </td>
    <td>
        <span class="theme1">1</span> Health
    </td>
</tr>
<tr>
    <td></td>
    <td></td>
    <td>Age: 
        <span class="green">20</span>
    </td>
</tr>
<tr>
    <td colspan="3" class="active">Strength: 
        <span class="tooltip" data-tip="Lorem ipsum dolor sit amet, consectetur">
            <icon>i-hand</icon> Hand
        </span>
    </td>
</tr>
<tr>
    <td colspan="3" class="inactive">Weakness: 
        <span class="tooltip" data-tip="Donec egestas lectus quis">
            <icon>i-feet</icon> Feet
        </span>
    </td>
</tr>

JSON:
{
  "tr": [
    {
      "td": [
        {
          "span": {
            "@class": "theme1",
            "#text": "1"
          },
          "#text": "Charisma"
        },
        {
          "span": {
            "@class": "theme1",
            "#text": "1"
          },
          "#text": "Smartness"
        },
        {
          "span": {
            "@class": "theme1",
            "#text": "1"
          },
          "#text": "Health"
        }
      ]
    },
    {
      "td": [
        "",
        "",
        {
          "span": {
            "@class": "green",
            "#text": "20"
          },
          "#text": "Age:"
        }
      ]
    },
    {
      "td": {
        "@colspan": "3",
        "@class": "active",
        "span": {
          "@class": "tooltip",
          "@data-tip": "Lorem ipsum dolor sit amet, consectetur",
          "icon": "i-hand",
          "#text": "Hand"
        },
        "#text": "Strength:"
      }
    },
    {
      "td": {
        "@colspan": "3",
        "@class": "inactive",
        "span": {
          "@class": "tooltip",
          "@data-tip": "Donec egestas lectus quis",
          "icon": "i-feet",
          "#text": "Feet"
        },
        "#text": "Weakness:"
      }
    }
  ]
}

1 Answers1

1

There are few libraries that are suitable for this task, such as html2Json, BeautifulSoup.

LXML is also a library for parsing data, see this example

But using these will not give you in the JSON format that you want. It would likely be something like this for a given <tr> elements </tr> tags.

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

As you can see this does not include metadata such as class, @data-tip, etc. So the best and easiest option is to go with JSON format that you have and use that to access the data that you want.

For example

import json

json_dict = json.load(JSON)#your data
 # Now you can use it like dictionary
 # For example:

print(json_dict["key"])
AzyCrw4282
  • 7,222
  • 5
  • 19
  • 35
  • I don't know what "template is the JSON of template loaded as Python objects." in html2json docs means. I'm making a script that will make requests from an API and get the HTML, so I can't use the website to make it. I'll receive the HTML and I want to process it with Python. So, when I want what is the "Strength" I just use var['tr'][0]['span'][0], something like that and I'll get what is the strength from the HTML that was returned. But thank you, I think beautifulsoup will help me. I think I can convert it to JSON somehow, I don't know yet. – Wevelly Felipe Apr 18 '20 at 02:41
  • I just want specific values with fixed position or something like that. I think I don't even need to convert it to JSON, because If I use like soup.tr.td.span.text, I get 1 which means Charisma. So just convert it from soup object to string. – Wevelly Felipe Apr 18 '20 at 02:43
  • 1
    For this html2json would suffice and beautifulsoup would also do the same job. You can import the library and use it seamlessly – AzyCrw4282 Apr 18 '20 at 02:44
  • I'm pretty new to this world. I tried to use html2json but I don't know how to use it. The template part is confusing me. – Wevelly Felipe Apr 18 '20 at 02:48
  • 1
    See this https://stackoverflow.com/questions/18544634/convert-a-html-table-to-json/18544794#18544794 IT should help you – AzyCrw4282 Apr 18 '20 at 02:50
  • 1
    Yes! It worked! I got it. It's working in another way, but it's working. No need to convert to JSON and BeautifulSoup helped a lot. Thank you! – Wevelly Felipe Apr 18 '20 at 03:14