Python string transformation

Question

Here is my string that I created by parsing data from a file:

723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1

Ideally I would like this output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Since I was not successful parsing the data and appending it dynamically (I am new to python) I understand that I can get the same desired output by transforming this string.

I researched, tested, and am stuck.

Essentially I need to replace every 3rd instance of the delimiter with a new line (or, maybe something better that anyone can suggest).

Any help is greatly appreciated!

Thanks

Can you give us an example of what the input file looks like? — TheF1rstPancake, Dec 23 '17 at 22:19
Sure, it was an xml file and I was parsing a nested segment. Natively python did not understand that each nested segment was independent, so I just parsed it to string knowing that every third piece I can split out at the end, effectively creating a file I can load into a table. — Scientific40, Dec 23 '17 at 22:21

Jean-François Fabre · Answer 1 · 2017-12-23T22:53:40.207

5

without regex:

split according to |
then group by 3 items (that is a classic: How can you split a list every x elements and add those x amount of elements to an new list?) and join back with |, join each group by newline

like this:

s = "723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1"

items = s.split("|")
print("\n".join(["|".join(items[i:i+3]) for i in range(0,len(items),3)] ))

note that the [] inside the outer join is on purpose, to get better performance (List comprehension without [ ] in Python) (even if I agree that it's ugly :))

result:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

BTW with regex it's simple too:

re.sub("(.*?\|.*?\|.*?)\|","\\1\n",s)

but it doesn't work very well if the number of items aren't exactly dividable by 3 (this can be done, but in a more complex way)

edited Dec 23 '17 at 22:53

answered Dec 23 '17 at 22:22

Jean-François Fabre

137,073
23
153
219

Yeah nice, you got an extra [] inside your print statement that is not needed though (outer join). And you could write it as this too: `print('\n'.join('|'.join(i) for i in zip(items[::3], items[1::3], items[2::3])))` – Anton vBR Dec 23 '17 at 22:39
2

the `[]` is on purpose, for better performance: https://stackoverflow.com/questions/9060653/list-comprehension-without-in-python – Jean-François Fabre Dec 23 '17 at 22:48
This worked perfectly...I think I was close, and now I should get things to work. Thanks! – Scientific40 Dec 23 '17 at 22:52
1

@AntonvBR `zip(items[::3], items[1::3], items[2::3]` would be better using `itertools.islice` to avoid creating actual lists. And what if you want to group by 10 elements? that would be tedious :) – Jean-François Fabre Dec 23 '17 at 22:57
I thought it was more readable in this particular case. You got a point again. – Anton vBR Dec 23 '17 at 22:58

Jan · Answer 2 · 2017-12-23T22:57:50.773

0

Using a regex solution:

import re

string = """723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1
723|NM|1|7201|QQ|1|72034|PP|1|72034N|AA|1|7203466|QW|1|72000|NM|1|7201111|NM|1|123|NM"""

rx = re.compile(r'(?:[^|]+\|?){1,3}')

for line in string.split("\n"):
    parts = "\n".join([part.group(0).rstrip("|") for part in rx.finditer(line)])
    print(parts)

This yields:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1
723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1
123|NM

See a demo on regex101.com.

edited Dec 23 '17 at 22:57

answered Dec 23 '17 at 22:41

Jan

42,290
8
54
79

this drops the last line if the number of elements aren't a multiple of 3. – Jean-François Fabre Dec 23 '17 at 22:50
@Jean-FrançoisFabre: Updated the expression as well as the demo (note the second line is not dividable by three). – Jan Dec 23 '17 at 22:58
hmmm that's using regex & fixing it afterwards with a lots of strings. That means that your regex101 demo doesn't hold anymore BTW. I'm sure it can be done with a smart regex & no post-processing, but I'm too lazy to try. – Jean-François Fabre Dec 23 '17 at 23:00

score 0 · Answer 3 · answered Dec 23 '17 at 23:42

You can use regular expression and can try this pattern :

import re

pattern=r'\d+\w\|\w+\|\d'
with open('file.txt','r') as f:
    for line in f:
        match=re.findall(pattern,line)
        for i in match:
            print(i)

output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Just for fun in one line :

import re

pattern=r'\d+\w\|\w+\|\d'
for i in [re.findall(pattern,line) for line in open('file.txt','r')][0]:
    print(i)

output:

723|NM|1
7201|QQ|1
72034|PP|1
72034N|AA|1
7203466|QW|1
72000|NM|1
7201111|NM|1

Python string transformation

3 Answers3