0

Would like to ask question regarding in selenium automation using Python (PyCharm)

I have a test data which is masked as .xls file. This is an HTML file masked in XLS format

I already found a code to convert this one (Link)

Here is the code:

bunch_size = 10000000  # Experiment with different sizes
bunch = []
with open(test_path+final_date+".xls", "r") as r, open(location_of_html, "w") as w:

    for line in r:
        print(line)
        x, y, z, rest = line.split(' ', 3)
        bunch.append(' '.join((x[:-3], y[:-3], z[:-3], rest)))
        if len(bunch) == bunch_size:
            w.writelines(bunch)
            bunch = []
    w.writelines(bunch)

The code above produce this line which is correct:

    <table style='height: 184px;' width='518'>

            <tbody>

            <tr>

            <td style='text-align: center;'>&nbsp;<img  /></td>

            <td style='text-align: center;' colspan='3'><span style='font-size: 12pt;'> <strong>Villanueva Enterprise</strong> </span><br /><span style='font-size: 10pt;'> <strong>Payslip</strong> </span></td>

            <td>&nbsp;</td>

            </tr>

            <tr>

            <td><span style='font-size: 8pt;'>NAME:</span></td>

But when the end product is converted, the produced code is:

     <ta style="heig 184p width=" 518'="">

            &nbsp;<img>
            <span style="font-size: 12pt;"> <strong>Villanueva Enterprise</strong> </span><br><span style="font-size: 10pt;"> <strong>Payslip</strong> </span>
            &nbsp;


            <span style="font-size: 8pt;">NAME:</span>
            <span style="font-size: 8pt;">Earner Minimum Wage 620,350</span>
            &nbsp;
            <span style="font-size: 8pt;">PAYROLL DATE: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</span>
            <span style="font-size: 8pt;">26 March 2021 </span>

Any ideas?

milanbalazs
  • 4,811
  • 4
  • 23
  • 45
OMS
  • 1
  • 5
  • Unfortunately found that before but no avail. even pandas. – OMS Nov 26 '19 at 07:52
  • Additional info the end product or the output which is written in html is this code: Don't know why the word "height=184px is cut ["\n", ' \n', ' – OMS Nov 26 '19 at 07:53
  • you are splitting the lines on ' ' and then skipping the last 3 characters in the join. So that line is not doing what you want it to do (ta vs table, heig vs height=, etc. ) – Chrisvdberge Nov 26 '19 at 07:57
  • I see thank you. Saw this one too and manage to solve it by removing the split command and solve it. – OMS Nov 26 '19 at 08:08

1 Answers1

0

Manage to solve this one. I just remove the split condition and directly convert the hybrid xls file.

    with open(test_path + final_date + ".xls", "r") as r, open(location_of_html, "w") as w:

        for line in r:
            print(line)
            w.write(line)
OMS
  • 1
  • 5