Parse a Text File extracting values according to its index position

Question

Hi guys how are you? I hope you just fine! How to parse a text file extracting specific values using index positions, append the values to a list, then convert it to pandas dataframe. So far I was to able write the below code: TEXT SAMPLE:

header:0RCPF049100000084220210407
body:1927907801100032G 00sucess
1067697546140032G 00sucess
1053756666000032G 00sucess
1321723368900032G 00sucess
1037673956810032G 00sucess

For example, the first line is the header, and from it, I just need the date which is in the following index position: date_from_header = linhas[0][18:26] The rest of the values is in body

import csv
import pandas as pd

headers = ["data_mov", "chave_detalhe", "cpf_cliente", "cd_clube",
           "cd_operacao","filler","cd_retorno","tc_recusa"]

# This is the actual code
with open('RCPF0491.20210407.1609.txt', "r")as f:
  linhas = [linha.rstrip() for linha in f.readlines()]
  for i in range(0,len(linhas)):
     data_mov = linhas[0][18:26]
     chave_detalhe=linhas[1][0:1]
     cpf_cliente=linhas[1][1:12]
     cd_clube=linhas[1][12:16]
     cd_operacao=linhas[1][16:17]
     filler=linhas[1][17:40]
     cd_retorno=linhas[1][40:42]
     tx_recusa=linhas[1][42:100]
data = [data_mov,chave_detalhe,cpf_cliente,cd_clube,cd_operacao","filler,cd_retorno,tc_recusa]

The intended result looks like this:

data_mov chave_detalhe cpf_cliente cd_clube cd_operacao filler cd_retorno tx_recusa
'20210407' '1'         92790780110 '0032'   'G'        'blank space' '00'   'sucesso'
'20210407' '1'         92790780110 '0032'   'G'        'blank space' '00'   'sucesso'
'20210407' '1'         92790780110 '0032'   'G'        'blank space' '00'   'sucesso'

This question is a bit hard to follow. Could you: post an example of filename.txt ? — SamBob, Apr 09 '21 at 13:08
But already looking at your code : your `for loops` repeat the same thing (reading lines 0 and 1 from the filename.txt) over and over again (as you don't use the iterator variable, `i` inside the loops) — SamBob, Apr 09 '21 at 13:10
But I expect your data is likely a csv or similar, and pandas has a function for reading that: `read_csv`. See: https://www.datacamp.com/community/tutorials/pandas-read-csv — SamBob, Apr 09 '21 at 13:12
@SamBob thanks I'm trying to figure out how to loop over the file and extract all values according to the indexes positions — Jayron Soares, Apr 09 '21 at 13:25
Ah, so you are trying to extract data_mov from the first line, and then "chave_detalhe", "cpf_cliente", "cd_clube", "cd_operacao","filler","cd_retorno","tc_recusa" from each of the other lines? Ignoring the first line for now, does https://stackoverflow.com/a/10851479/1581658 help for splitting up the lines? — SamBob, Apr 09 '21 at 13:34
@SamBob that is right, the date is unique for each file, it is in the header, the rest of the information is in the body. Thanks for the link — Jayron Soares, Apr 09 '21 at 13:44

score 1 · Accepted Answer · answered Apr 09 '21 at 13:45

Using stackoverflow.com/a/10851479/1581658

def parse_file(filename):
    indices = [0,1,12,16,17,18,20] # list the indices to split on
    parsed_data = [] # returned array by line
    with open(filename) as f:
        header = next(f) #skip the header
        data_mov = header[18:26] # and get data_mov from header
        for line in f: #loop through lines
            #split each line by the indices
            parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
            parsed_data.append(parts)
    return parsed_data

print(parse_file("filename.txt"))

Thanks for your time, I make some adaptations, now it's working fine! Best regards — Jayron Soares, Apr 09 '21 at 18:33

Jayron Soares · Answer 2 · 2021-04-09T22:53:04.383

I thanks the help of SamBob, following the final solution in case anyone needs:

import itertools
import pandas as pd

pd.options.display.width = 0

def parse_file(filename):
    indices=[0,1,12,16,17,18,42]  # list of indexes
    parsed_data = [] # return a list
    with open(filename) as f:
        header = next(f) 
        data_mov = header[18:26]
        for line in itertools.islice(f,1,100): 
            # dividr de acordo com os índices.
            parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
            parsed_data.append(parts)
            
            # convert to dataframe
            cols = ['data_mov', 'chave_detalhe', 'cpf_cliente','cd_clube','cd_operacao','filler','cd_retorno','tx_recusa']
            df = pd.DataFrame(parsed_data, columns=cols)

    return df


df = (parse_file("filename.txt"))

Parse a Text File extracting values according to its index position

2 Answers2