-1

below is the text that i want to split and store it in the variables.

Pppp CCCC TTTT                           MMMMM            SSSSSS Oono.   

1  NIL fL-E 10UU (SPD+), 1000XXXXX (SPD) WEEEEEEEEEEEEE   CATTTTTTTTT   
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT 

i want to split it so that variable 1 :-

 Pppp

 1    
 44    
 44    
 44   
 44
 44

Variable 2 :-

CCCC TTTT                              

NIL fL-E 10UU (SPD+), 1000XXXXX (SPD)    
10/100/1000BBBBB Ppppppp OOo E SSSSSS 
10/100/1000BBBBB Ppppppp OOo E SSSSSS 
10/100/1000BBBBB Ppppppp OOo E SSSSSS 
10/100/1000BBBBB Ppppppp OOo E SSSSSS 
10/100/1000BBBBB Ppppppp OOo E SSSSSS

Variable 3:-

MMMMM            

 WEEEEEEEEEEEEE      
 WS-XXXXX-RRRRR+I 
 WS-XXXXX-RRRRR+I 
 WS-XXXXX-RRRRR+I 
 WS-XXXXX-RRRRR+I 
 WS-XXXXX-RRRRR+I

Variable 4:-

SSSSSS Oono.

CATTTTTTTTT
CATTTTTTTTT
CATTTTTTTTT
CATTTTTTTTT
CATTTTTTTTT
CATTTTTTTTT

Each variable should store specified value

code i have tried:-

with open ('sh_module.txt', 'r') as module_info:
    lines = module_info.read().splitlines()[6:]
    for l in lines:
        if not l.isspace():
            storeSplit = ("  ".join(l.split()[1:10]))
            A_of_splitOfstoreSplit , B_of_splitOfstoreSplit = storeSplit.split('W') 
            print (storeSplit)

code doesn't works. :-(

Note:- the text so written is as it is in the text file. do consider the spaces.!

thx for the help.! :-)

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
Mystery
  • 81
  • 11
  • Possible duplicate of [How to efficiently parse fixed width files?](https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files) – Patrick Artner Jul 31 '19 at 10:11

1 Answers1

1

Edit: found How to efficiently parse fixed width files? after answering. This answer is specific to your question, the dupe shows other ways to deal with fixed length file parsing using structs etc.


You seem to have a fixed-width format - you can simply split each line into a list and then transpose it into colums using zip.

Create file:

# 3456789012345678901234567890123456789012345678901234567890123456789
t = """
Pppp CCCC TTTT                           MMMMM            SSSSSS Oono.   

1  NIL fL-E 10UU (SPD+), 1000XXXXX (SPD) WEEEEEEEEEEEEE   CATTTTTTTTT   
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT
44 10/100/1000BBBBB Ppppppp OOo E SSSSSS WS-XXXXX-RRRRR+I CATTTTTTTTT 
"""

with open ('sh_module.txt', 'w') as module_info:
    module_info.write("header\nheader\nheader\nheader\nheader\nheader\n")
    module_info.write(t)

Process file:

with open ('sh_module.txt', 'r') as module_info:
    lines = [n.strip() for n in module_info.read().splitlines()[6:]]


data = [] 

# split file-lines into data - special case for line starting with Pppp as its 4 long
for line in lines:
    # ignore empty lines
    if line.strip():
        if line.startswith("Pppp"):  # slightly different fixed width
            data.append( [line[:4].strip(), line[4:41].strip(), 
                          line[41:58].strip(),line[58:].strip()] )
            continue
        linedata = []
        linedata.extend( (line[:3].strip(), line[3:41].strip(), 
                          line[41:58].strip(),line[58:].strip()) )
        data.append(linedata)

# create a dict with variables from the splitted line-list
variabs = {a[0]:[i for i in a[1:]] for a in zip(*data)}

print(variabs)

Output:

{'Pppp': ['1', '44', '44', '44', '44', '44'],
 'CCCC TTTT': ['NIL fL-E 10UU (SPD+), 1000XXXXX (SPD)', '10/100/1000BBBBB Ppppppp OOo E SSSSSS', 
               '10/100/1000BBBBB Ppppppp OOo E SSSSSS', '10/100/1000BBBBB Ppppppp OOo E SSSSSS', 
               '10/100/1000BBBBB Ppppppp OOo E SSSSSS', '10/100/1000BBBBB Ppppppp OOo E SSSSSS'], 
 'MMMMM': ['WEEEEEEEEEEEEE', 'WS-XXXXX-RRRRR+I', 'WS-XXXXX-RRRRR+I', 'WS-XXXXX-RRRRR+I', 
           'WS-XXXXX-RRRRR+I', 'WS-XXXXX-RRRRR+I'], 
 'SSSSSS Oono.': ['CATTTTTTTTT', 'CATTTTTTTTT', 'CATTTTTTTTT', 'CATTTTTTTTT', 'CATTTTTTTTT', 
                  'CATTTTTTTTT']}

You can access the columns by variabs["Pppp"], variabs["SSSSSS Oono."], etc.


There are other ways to handle this, see How to efficiently parse fixed width files? for more


Edit: using enumerate:

# split file-lines into data - special case for line on idx 0  
for idx, line in enumerate(x.strip() for x in lines if x.strip()):
    if idx == 0:  # slightly different fixed width
        data.append( [line[:4].strip(), line[4:41].strip(), 
                      line[41:58].strip(),line[58:].strip()] )
        continue
    linedata = []
    linedata.extend( (line[:3].strip(), line[3:41].strip(), 
                      line[41:58].strip(),line[58:].strip()) )
    data.append(linedata)
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • i understood ur code and your code worked.! but can u please explain the line **variabs = {a[0]:[i for i in a[1:]] for a in zip(*data)}** . a bit in detail if u can.! – Mystery Jul 31 '19 at 11:07
  • @Mystery it is a dictionary comprehension that is used to create the column view frop the zip(*data) values. Essentially `data` is a list of lists - each inner list is one line. Using `zip(*data)` creates a generator of tuples where the first tuple consists of all 0-elements of all inner lists, the next tuple contains all 1-element of all inner lists etc. You can use `for a in zip(*data): print(a)` to print the tuples. `variabs = {a[0]:[i for i in a[1:]] for a in zip(*data)}` just creates the dictionary with the first value of the tuple as key and the other of the tuple as value. – Patrick Artner Jul 31 '19 at 11:12
  • what if the keys varies, that is, as of now i m accessing the lists using the keys **"Pppp"** but how about keeping the keys as a variable that takes in the dynamic content so produced.? how can i achieve that.? – Mystery Jul 31 '19 at 11:18
  • what if **Pppp** in the code is not there, instead it's "6" or something else.? to be specific i want to keep the keys,**variable**, that is, it could be anything. – Mystery Jul 31 '19 at 11:33
  • @Mystery I have no idea what you are after. To use "dynamically" named variables you use a dictionary. If your 1st line is always the one with the key names, use `for idx, line in enumerate(lines):` and on idx==0 extract the keys and on all other idx split by positional slicing. This solution should be enough to start you to a solution that fits towards your undisclosed other data. see edit. I have no idea how you want to address your "variables" then - if you do not kow their names. – Patrick Artner Jul 31 '19 at 14:14
  • can we initiate a conversation in chat ? – Mystery Aug 01 '19 at 05:34
  • @Mystery https://chat.stackoverflow.com/rooms/197318/python-fixed-csv-splitting-the-line-into-multiple-variables – Patrick Artner Aug 01 '19 at 05:55