1

I am reading a large text file line by line and while reading, I would like to apply if condition, where there is a certain codes that needs to be read and append those to a dataframe. I have a working code which works well for 1 code for if tag = 'ABC-1234' then it works, however when I put in more codes, I end up getting empty dataframe. I have more than 100 codes and I only want to read those lines for now. I appreciate if any of you suggest a better way to handle the problem I am facing. Below is the working code sample.

import pandas as pd
filename ="C:/Users/abcd/Downloads/abcd-xyz-433.txt"
filename =filename
code= pd.read_excel('C:/Users/abcd/Downloads/xyz_codes.xlsx')
code_list=code['codes'].tolist()

with open(filename, 'r') as f:
    sample =[]
    for line in f:
        tag=line[:45].split('|')[5]
        if tag == 'AB-C711':                         #This works
            sample.append(line.split('|')) 

print('Everything in the list is read') 

There are two different statements which I am trying to get it to work. But I end up getting empty dataframe. Code_list is the list created from a column of codes from an excel file.

if tag == ('AB-C711', 'AB-D702'):            #This doesnt work
            sample.append(line.split('|')) 

if tag == code_list:                         #This doesnt work
            sample.append(line.split('|'))  

I would like to read the file line by line which matches my code list and then split the data on the delimiter and create a dataframe out of it.

Alex
  • 1,172
  • 11
  • 31
Vishwas
  • 343
  • 2
  • 13

1 Answers1

0
import pandas as pd
filename ="C:/Users/vgowda/Downloads/abcd-xyz-433.txt"
filename =filename
code= pd.read_excel('C:/Users/Downloads/abc_codes.xlsx')
code_list=code['codes'].tolist()

with open(filename, 'r') as f:
    sample =[]
    for line in f:
        tag=line[:45].split('|')[5]
        if tag in code_list:        # this works
#         if tag == 'KV-C901':
            sample.append(line.split('|')) 

print('arrays are appended and ready to create a dataframe out of an array') 
Vishwas
  • 343
  • 2
  • 13