1

Here is my string that I created by parsing data from a file:

723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1

Ideally I would like this output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Since I was not successful parsing the data and appending it dynamically (I am new to python) I understand that I can get the same desired output by transforming this string.

I researched, tested, and am stuck.

Essentially I need to replace every 3rd instance of the delimiter with a new line (or, maybe something better that anyone can suggest).

Any help is greatly appreciated!

Thanks

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Can you give us an example of what the input file looks like? – TheF1rstPancake Dec 23 '17 at 22:19
  • Sure, it was an xml file and I was parsing a nested segment. Natively python did not understand that each nested segment was independent, so I just parsed it to string knowing that every third piece I can split out at the end, effectively creating a file I can load into a table. – Scientific40 Dec 23 '17 at 22:21

3 Answers3

5

without regex:

like this:

s = "723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1"

items = s.split("|")
print("\n".join(["|".join(items[i:i+3]) for i in range(0,len(items),3)] ))

note that the [] inside the outer join is on purpose, to get better performance (List comprehension without [ ] in Python) (even if I agree that it's ugly :))

result:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

BTW with regex it's simple too:

re.sub("(.*?\|.*?\|.*?)\|","\\1\n",s)

but it doesn't work very well if the number of items aren't exactly dividable by 3 (this can be done, but in a more complex way)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Yeah nice, you got an extra [] inside your print statement that is not needed though (outer join). And you could write it as this too: `print('\n'.join('|'.join(i) for i in zip(items[::3], items[1::3], items[2::3])))` – Anton vBR Dec 23 '17 at 22:39
  • 2
    the `[]` is on purpose, for better performance: https://stackoverflow.com/questions/9060653/list-comprehension-without-in-python – Jean-François Fabre Dec 23 '17 at 22:48
  • This worked perfectly...I think I was close, and now I should get things to work. Thanks! – Scientific40 Dec 23 '17 at 22:52
  • 1
    @AntonvBR `zip(items[::3], items[1::3], items[2::3]` would be better using `itertools.islice` to avoid creating actual lists. And what if you want to group by 10 elements? that would be tedious :) – Jean-François Fabre Dec 23 '17 at 22:57
  • I thought it was more readable in this particular case. You got a point again. – Anton vBR Dec 23 '17 at 22:58
0

Using a regex solution:

import re

string = """723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1
723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1|123|NM"""

rx = re.compile(r'(?:[^|]+\|?){1,3}')

for line in string.split("\n"):
    parts = "\n".join([part.group(0).rstrip("|") for part in rx.finditer(line)])
    print(parts)

This yields:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1
723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1
123|NM

See a demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • this drops the last line if the number of elements aren't a multiple of 3. – Jean-François Fabre Dec 23 '17 at 22:50
  • @Jean-FrançoisFabre: Updated the expression as well as the demo (note the second line is not dividable by three). – Jan Dec 23 '17 at 22:58
  • hmmm that's using regex & fixing it afterwards with a lots of strings. That means that your regex101 demo doesn't hold anymore BTW. I'm sure it can be done with a smart regex & no post-processing, but I'm too lazy to try. – Jean-François Fabre Dec 23 '17 at 23:00
0

You can use regular expression and can try this pattern :

import re

pattern=r'\d+\w\|\w+\|\d'
with open('file.txt','r') as f:
    for line in f:
        match=re.findall(pattern,line)
        for i in match:
            print(i)

output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Just for fun in one line :

import re

pattern=r'\d+\w\|\w+\|\d'
for i in [re.findall(pattern,line) for line in open('file.txt','r')][0]:
    print(i)

output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88