0

EDIT: Added html table itself

I'm parsing dynamic webpage using selenium. I need to make a dataframe from table tag on said page and make some changes to that table then save it as xlsx.

import pandas as pd

data1 = {
    'A': ['actual header', 'row#', 2, 4, 5, 6, 7],
    'B': ['string', 'row#', 1, 2, 3, 4, 5],
    'C': ['also string', 'row#', 0, 0, 962.5, 0, 962.5]
}
data2 = {
    'A': ['random parsed string', 'actual header', 'row#', 2, 4, 5, 6, 7],
    'B': ['also random parsed string', 'string', 'row#', 1, 2, 3, 4, 5],
    'C': ['random parsed string', 'also string', 'row#', 0, 0, 962.5, 0, 962.5]
}
df1 = pd.DataFrame(data=data1)
df2 = pd.DataFrame(data=data2)

My question is: how can i modify values from df1 C column so it would be separated with a comma, not a dot.

I tried to make it numeric, didn't work

df['C'] = pd.to_numeric(df['C'], errors='ignore')

I tried to read from html using this answer. It just turns 962.5 to 9625. Also swapped dots and commas between arguments.

df = pd.read_html('my.html', encoding='utf-8', decimal=',', thousands='.')[0]



<table class="table-givc svelte-3b8tob">
    <col style="width: 40px;">
        <col style="width: 120px;">
            <col style="width: 40px;">
                <col style="width: 40px;">
                    <thead>
                        <tr class="excel-columns svelte-3b8tob">
                            <td class="td-triangle svelte-3b8tob">
                                <div class="cell-triangle svelte-3b8tob"><span class="svelte-3b8tob"></span></div>
                            </td>
                            <td class="svelte-3b8tob">A</td>
                            <td class="svelte-3b8tob">B</td>
                            <td class="svelte-3b8tob">C</td>
                        </tr>
                    </thead>
                    <tbody>
                        <tr data-type-id="3" data-row-id="19227" data-row-index="0" class="row-render-header svelte-3b8tob" style="">
                            <td class="td-numered svelte-3b8tob">1</td>
                            <td class="table-fake-header text-center svelte-1c7mtha" colspan="1" rowspan="1">Наименование показателя</td>
                            <td class="table-fake-header text-center svelte-1c7mtha is-num" colspan="1" rowspan="1">№ строки</td>
                            <td class="table-fake-header text-center svelte-1c7mtha" colspan="1" rowspan="1">Всего</td>
                        </tr>
                        <tr data-type-id="3" data-row-id="19231" data-row-index="1" class="row-render-header svelte-3b8tob" style="">
                            <td class="td-numered svelte-3b8tob">2</td>
                            <td class="table-fake-header text-center svelte-1c7mtha" colspan="1" rowspan="1">1</td>
                            <td class="table-fake-header text-center svelte-1c7mtha is-num" colspan="1" rowspan="1">2</td>
                            <td class="table-fake-header text-center svelte-1c7mtha" colspan="1" rowspan="1">3</td>
                        </tr>
                        <tr data-type-id="2" data-row-id="19235" data-row-index="2" style="" class="svelte-3b8tob">
                            <td class="td-numered td-numered-fixed svelte-3b8tob">3</td>
                            <td class="is-label  svelte-5vfc64" style="text-align: left;" data-header-index="0" data-col-id="13531" colspan="1" rowspan="1"> Внутренние затраты на внедрение и использование цифровых технологий (сумма строк 02, 03, 04)</td>
                            <td style="text-align: center;" class="is-label is-num  svelte-5vfc64" data-header-index="1" data-col-id="13535" colspan="1" rowspan="1">01</td>
                            <td data-excel="C3" style="text-align: center;" data-header-index="2" data-col-id="13539" data-col-type-id="4" colspan="1" rowspan="1" class="svelte-5vfc64 last-td is-label">
                                <div class="custom-tooltip is-function svelte-5vfc64"></div> 962.5</td>
                        </tr>
                        <tr data-type-id="2" data-row-id="19239" data-row-index="3" style="" class="svelte-3b8tob">
                            <td class="td-numered td-numered-fixed svelte-3b8tob">4</td>
                            <td class="is-label pl-3 svelte-5vfc64" style="text-align: left;" data-header-index="0" data-col-id="13531" colspan="1" rowspan="1"> в том числе по источникам финансирования: собственные средства организации</td>
                            <td style="text-align: center;" class="is-label is-num  svelte-5vfc64" data-header-index="1" data-col-id="13535" colspan="1" rowspan="1">02</td>
                            <td data-excel="C4" style="text-align: center;" data-header-index="2" data-col-id="13539" data-col-type-id="4" colspan="1" rowspan="1" class="svelte-5vfc64 td-editor last-td"> 0.0</td>
                        </tr>
                        <tr data-type-id="2" data-row-id="19243" data-row-index="4" style="" class="svelte-3b8tob">
                            <td class="td-numered td-numered-fixed svelte-3b8tob">5</td>
                            <td class="is-label pl-3 svelte-5vfc64" style="text-align: left;" data-header-index="0" data-col-id="13531" colspan="1" rowspan="1"> средства бюджетов всех уровней</td>
                            <td style="text-align: center;" class="is-label is-num  svelte-5vfc64" data-header-index="1" data-col-id="13535" colspan="1" rowspan="1">03</td>
                            <td data-excel="C5" style="text-align: center;" data-header-index="2" data-col-id="13539" data-col-type-id="4" colspan="1" rowspan="1" class="svelte-5vfc64 td-editor last-td"> 962.5</td>
                        </tr>
                        <tr data-type-id="2" data-row-id="19247" data-row-index="5" style="" class="svelte-3b8tob">
                            <td class="td-numered td-numered-fixed svelte-3b8tob">6</td>
                            <td class="is-label pl-3 svelte-5vfc64" style="text-align: left;" data-header-index="0" data-col-id="13531" colspan="1" rowspan="1"> прочие привлеченные средства</td>
                            <td style="text-align: center;" class="is-label is-num  svelte-5vfc64" data-header-index="1" data-col-id="13535" colspan="1" rowspan="1">04</td>
                            <td data-excel="C6" style="text-align: center;" data-header-index="2" data-col-id="13539" data-col-type-id="4" colspan="1" rowspan="1" class="svelte-5vfc64 td-editor last-td"> 0.0</td>
                        </tr>
                        <tr data-type-id="2" data-row-id="19251" data-row-index="6" style="" class="svelte-3b8tob">
                            <td class="td-numered td-numered-fixed svelte-3b8tob">7</td>
                            <td class="is-label pl-5 svelte-5vfc64" style="text-align: left;" data-header-index="0" data-col-id="13531" colspan="1" rowspan="1"> из них: некоммерческих организаций</td>
                            <td style="text-align: center;" class="is-label is-num  svelte-5vfc64" data-header-index="1" data-col-id="13535" colspan="1" rowspan="1">05</td>
                            <td data-excel="C7" style="text-align: center;" data-header-index="2" data-col-id="13539" data-col-type-id="4" colspan="1" rowspan="1" class="svelte-5vfc64 td-editor last-td"> 0.0</td>
                        </tr>
                        <tr data-type-id="2" data-row-id="19255" data-row-index="7" style="" class="svelte-3b8tob">
                            <td class="td-numered td-numered-fixed svelte-3b8tob">8</td>
                            <td class="is-label pl-5 svelte-5vfc64" style="text-align: left;" data-header-index="0" data-col-id="13531" colspan="1" rowspan="1"> физических лиц</td>
                            <td style="text-align: center;" class="is-label is-num  svelte-5vfc64" data-header-index="1" data-col-id="13535" colspan="1" rowspan="1">06</td>
                            <td data-excel="C8" style="text-align: center;" data-header-index="2" data-col-id="13539" data-col-type-id="4" colspan="1" rowspan="1" class="svelte-5vfc64 td-editor last-td"> 0.0</td>
                        </tr>
                    </tbody>
</table>

1 Answers1

1

I would use a simple replace :

df = (pd.read_html("my.html", encoding="utf-8")[0].iloc[:, 1:]
           .replace({"C": {r"\.": ","}}, regex=True))

​Output :

print(df)
                                                                                              A         B      C
0                                                                       Наименование показателя  № строки  Всего
1                                                                                             1         2      3
2  Внутренние затраты на внедрение и использование цифровых технологий (сумма строк 02, 03, 04)        01  962,5
3                    в том числе по источникам финансирования: собственные средства организации        02    0,0
4                                                                средства бюджетов всех уровней        03  962,5
5                                                                  прочие привлеченные средства        04    0,0
6                                                            из них: некоммерческих организаций        05    0,0
7                                                                                физических лиц        06    0,0

Update :

If you need to perform that starting from the column C, here is one option :

df = (pd.read_html('input.html', encoding='utf-8')[0].iloc[:, 1:]
           .set_index(["A", "B"]).replace({r"\.": ","}, regex=True).reset_index())
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Thanks. What if want this to be more adaptive? For example, not only replace value at `C` column, but from `C` to the last column? Or do i need to post another question for that? – lineage_refugee Apr 11 '23 at 19:14
  • 1
    No need to open a new question, I will update my answer, give me a minute.. – Timeless Apr 11 '23 at 19:16