0

I'm struggling to read the attached TXT file to present as csv each field read from the file I made a code that comes close to what I want but I don't advance.

TXT file format:

    COMPANY TEST OF BRAZIL-        Junho/2022 Horista
      37-6  WALTER WHITE DA SILVA                 
         1006136-9   MOTORISTA            A33 1     00011523            


001 Hrs Normais Diurnas           183,333    2.555,66 +
031 Hrs Dsr Vencimento             36,667      511,14 +
037 Dsr Adicionais                             306,36 +
053 Reembolso de Vale Transpo                   47,61 +
102 Hrs Extras  ( 60%)             68,680    1.531,84 +
824 Vale Transporte                            500,00 +
290 Alimentacao Funcionario                                   10,50 -
404 Adiantamento Normal Desco                              1.011,95 -
476 Desconto Seconci Dependen                                 65,46 -
511 Inss Normal                                              522,87 -
561 Irf Normal                                                90,07 -
567 Irf Recol Adto                                           214,77 -
820 Desc de Vale Transporte                                  184,00 - 
 
                                             5.452,61+      2.099,62-
                                                  
                                                            3.352,99

       13,94      4.905,00      4.905,00       392,40      2.965,82


 COMPANY TEST OF BRAZIL-        Junho/2022 Horista
     102-0  WILTON PEATER TEMPLATE               
           31022-0   L EQUIPE B           000 1     00011524            


001 Hrs Normais Diurnas           183,333    2.220,16 +
031 Hrs Dsr Vencimento             36,667      444,04 +
037 Dsr Adicionais                             225,77 +
053 Reembolso de Vale Transpo                   26,40 +
102 Hrs Extras  ( 60%)             58,260    1.128,85 +
290 Alimentacao Funcionario                                   10,50 -
404 Adiantamento Normal Desco                                854,04 -
476 Desconto Seconci Dependen                                 98,19 -
511 Inss Normal                                              398,81 -
561 Irf Normal                                                48,77 -
567 Irf Recol Adto                                           211,64 -
820 Desc de Vale Transporte                                  159,85 -
 
 
 
 
 
                                             4.045,22+      1.781,80-
                                                  
                                                            2.263,42

       12,11      4.018,82      4.018,82       321,50      2.554,33

My code reads the first line in the positions I want but the lines below I can't, much less repeat the reading on the next payslip contained in the file.

    # Read TXT
    
    with open ("I:input\\test.txt", "r") as ft:
        head_text = ft.readline()
    # Capturar campos
        ## Head
    competence = head_text[46:59]
    
    company = head_text[:45]
    
    print('competence',';','company')
    print(competence,';',company,)

The output at the moment is this:

# competence;company
junho/2022; COMPANY TEST OF BRAZIL

how the exit should be

# competence;company;id_employee;employe;etc...
junho/2022;COMPANY TEST OF BRAZIL;37-6;WALTER WHITE DA SILVA...
junho/2022;COMPANY TEST OF BRAZIL;102-0;WILTON PEATER TEMPLATE...

Reading and capturing the data line by line I have to finish a payslip that will form a line in the output and the second payslip will form the second line in the output and so it will be until the end of the txt file At the moment I can't move forward and I'm lost.

  • [*Please do not post text as images*](https://meta.stackoverflow.com/q/285551). Copy and paste the text into your question and use the code formatting tool (`{}` button) to format it correctly. Images are not searchable, cannot be interpreted by screen readers for those with visual impairments, and cannot be copied for testing and debugging purposes. Use the [edit] link to modify your question. – MattDMo Jul 11 '22 at 15:08
  • 3
    That's not a CSV file by any stretch of the definition. That's a multi-page report with headers, footers, complex layout, section totals and formatting, saved as text. You can't read that with any CSV parser. A CSV a *simple* text file containing only Values Separated by Commas. The first line can act as a header but that's not a requirement – Panagiotis Kanavos Jul 11 '22 at 15:08
  • Even if you remove the headers and totals, you have a fixed-width file, not a CSV. A quirk is that the values are right aligned instead of left-aligned and the sign is at the end, but there are several ways to [parse a fixed-width file](https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files). It seems that all non-detail rows are centered or right-aligned which would allow you to just discard all lines that start with a space. – Panagiotis Kanavos Jul 11 '22 at 15:15

1 Answers1

0

I think that you may use the following code to have your desired output. You should make sure if your first data has similar template. You can also edit the template if your desired output needs to be edited. Please see the code:

!pip install ttp

from ttp import ttp
import json

with open ("test.txt", "r") as ft:
    data_to_parse = ft.read()

ttp_template = """
 {{Part_2|ORPHRASE}}-        {{Part_1}} {{ignore}}
     {{Part_3}}  {{Part_4|ORPHRASE}}
"""

def stack_test(data_to_parse):
    parser = ttp(data=data_to_parse, template=ttp_template)
    parser.parse()

    # print result in JSON format
    results = parser.result(format='json')[0]
    #print(results)

    #converting str to json. 
    result = json.loads(results)
    return(result)

# print(stack_test(data_to_parse))

for i in stack_test(data_to_parse)[0]:
    print(f"{i['Part_1']};{i['Part_2']};{i['Part_3']};{i['Part_4']}")

See the print(i) output first:

enter image description here

See also your desired output:

enter image description here

Baris Ozensel
  • 433
  • 1
  • 3
  • 11