-1

I have a .txt file that has 20 rows. Each row is a variation of this:

76C1125477854212562 112544 where:

var1=76C11

var2=25477

var3=85421

var4=2562

var5=112544

I've been looking for ways to split/parse the text file so I can iterate over the 20 rows and split it accordingly in python. I have tried parsing and splitting through split() but it wouldn't work since I have no line breaks. I've also tried adding line breaks at specific places and haven't been able to find resources to help me with that. Any help is appreciated!

So far:

file = open('FileName.txt','r')
read = file.readlines()
Michael Ruth
  • 2,938
  • 1
  • 20
  • 27
bacca
  • 5
  • 1
  • 3
    You're not going to be able to use `str.split()` here. First answer the question: "How do I know var1=76C11 rather than var1=76C112?" And answer this question without a loss of generality for other variables. The input data provided could be hexadecimal. Is it possible these are binary data that have been stringified into a txt file? – Michael Ruth Jul 14 '22 at 22:07
  • 2
    It's impossible to answer this question without more information about the format of the data – Michael Ruth Jul 14 '22 at 22:11
  • https://stackoverflow.com/questions/21351275/split-a-string-to-even-sized-chunks – Nin17 Jul 14 '22 at 22:11
  • Tell us more about your splitting logic. Is it by number of characters? And what about the last one? Is the blank significant or just part of the fixed width fields? Also if it is just 20 lines - go in and edit it to be what you want. – jch Jul 14 '22 at 22:25
  • @bacca, added a solution, does it help? – Naveed Jul 14 '22 at 22:59
  • @MichaelRuth I'm not sure how to answer that question. var2-5 are actual number values that I intend use in an analysis. It could be income, profit etc. The file is saved (for some reason) in this txt format. So, I intend to salvage it by writing python code that can add line breaks where I want them to, assign them variable names so I can export it into an excel file for use. Does that answer your question? – bacca Jul 15 '22 at 16:55
  • @jch worst case scenario I will go in and add those line breaks on my own, but I would like to be able to automate this entire process for efficiency and to serve as a learning example for myself. Thanks – bacca Jul 15 '22 at 16:56
  • @Naveed I just ran that and I believe that's exactly what I wanted to do - thank you so much! – bacca Jul 15 '22 at 17:00
  • You answered my question by accepting the [answer](https://stackoverflow.com/a/72987320/4583620). You know that the variable values are fixed-width. Ask the actual question next time and you'll receive better answers sooner. – Michael Ruth Jul 18 '22 at 23:17

2 Answers2

0

you can use list slicing https://www.geeksforgeeks.org/python-list-slicing/

def subdivide_lst(lst, STEP=5):
   for i in range(len(lst)//STEP+1):
      yield lst[i*STEP:i*STEP+STEP]  

with open("yourfile.txt","r") as f: 
   for line in f.readlines():
      res = line.split(" ")
      res = [*subdivide_lst(res[0]), res[1]]

if you don't know what open is : https://www.geeksforgeeks.org/with-statement-in-python/

if you don't know what yield is : https://www.geeksforgeeks.org/generators-in-python/

And the unpacking https://geekflare.com/python-unpacking-operators/

Axeltherabbit
  • 680
  • 3
  • 20
0

here is one way to do it. assuming that variable sizes are fixed width. For last one I use large number to pick everything remaining

df = pd.read_fwf(r'txt.csv', widths=[5,5,5,4,10], header=None)
df
    0       1       2       3       4
0   76C11   25477   85421   2562    112544
1   16C11   25477   85421   2562    112541
2   26C11   25477   85421   2562    112542
3   36C11   25477   85421   2562    112543
4   46C11   25477   85421   2562    112544

Sample data used

76C1125477854212562 112544 
16C1125477854212562 112541 
26C1125477854212562 112542 
36C1125477854212562 112543 
46C1125477854212562 112544 


Naveed
  • 11,495
  • 2
  • 14
  • 21