Parse String every 2 letters

Question

How can i parse a long string that come from a .txt file every

2 characters

?

Do you have a code sample of what you have tried to do so far? — danseery, Jan 18 '13 at 14:59
I don't understand the question. A text file is a long sequence of characters. Are you asking `how can I split it up into blocks of two characters`? — Katriel, Jan 18 '13 at 14:59
Can you explain a little better. Are you wanting every consecutive pair of characters from a file object? — Steve, Jan 18 '13 at 14:59
Not sure what you mean here. If you start with teh string "foobar", do you want to parse it `"fo","ob","ar"` or `"fo","oo","ob","ba","ar"`? — mgilson, Jan 18 '13 at 15:00
for every 2 letters in a example like this AB08H5F7UF i will parse like AB 08 h5 F7 UF — PythonNewbie, Jan 18 '13 at 16:03

ATOzTOA · Answer 1 · 2013-01-18T16:21:39.663

2

Try

print re.findall(r'[\S]{1,2}', "The quick brown fox jumped over the lazy dog")

>>
['Th', 'e', 'qu', 'ic', 'k', 'br', 'ow', 'n', 'fo', 'x', 'ju', 'mp', 'ed', 'ov', 'er', 'th', 'e', 'la', 'zy', 'do', 'g']

OR

print re.findall(r'.{1,2}', "The quick brown fox jumped over the lazy dog")

>>
['Th', 'e ', 'qu', 'ic', 'k ', 'br', 'ow', 'n ', 'fo', 'x ', 'ju', 'mp', 'ed', ' o', 've', 'r ', 'th', 'e ', 'la', 'zy', ' d', 'og']

Update

For you specific requirement:

>>> print re.findall(r'[\S]{1,2}', "08AB78UF")
['08', 'AB', '78', 'UF']
>>>

edited Jan 18 '13 at 16:21

answered Jan 18 '13 at 15:04

ATOzTOA

34,814
22
96
117

Applause applause. Do another that skips whitespace! – PinkElephantsOnParade Jan 18 '13 at 15:08
@PinkElephantsOnParade The first one does skips whitespace, not enough? – ATOzTOA Jan 18 '13 at 15:11
I meant forms ['Th', 'eq', 'ui'...] - skipping white space before extracting as opposed to after. But I'm not the original poster - that was just for my own fun, lol. – PinkElephantsOnParade Jan 18 '13 at 15:16
@PinkElephantsOnParade `r'\S(?:\s?)\S'`, but I get `'Th','e q'...` – ATOzTOA Jan 18 '13 at 15:35
i just want to parse it to output a file that is from a string : 08AB78UF to 08 AB 78 UF – PythonNewbie Jan 18 '13 at 16:04

Abhijit · Accepted Answer · 2013-01-18T16:34:04.523

2

You can just zip two strings, with a gap of one offset

>>> data = "foobar"
>>> map(''.join, zip(data, data[1:]))
['fo', 'oo', 'ob', 'ba', 'ar']

And a similar solution using itertools.izip

>>> from itertools import izip
>>> map(''.join, izip(data, data[1:]))
['fo', 'oo', 'ob', 'ba', 'ar']

If you are using Py3.X, convert the map to LC

>>> [''.join(e) for e in  izip(data, data[1:])]
['fo', 'oo', 'ob', 'ba', 'ar']

As @Duncan mentioned, the sub-strings would overlap. In case if you want non-overlapping substrings, either refer @Duncan's answer, or @Duncan's comment or the grouper recipe

>>> [''.join(e) for e in list(izip_longest(*[iter(data)] * 2,fillvalue=''))]
['fo', 'ob', 'ar']

You can easily join the resultant list to a string

>>> ' '.join(''.join(e) for e in  izip(data, data[1:]))
'fo oo ob ba ar'

edited Jan 18 '13 at 16:34

answered Jan 18 '13 at 15:12

Abhijit

62,056
18
131
204

That does the overlapping sliding window, for completeness you should add that if the OP wants to not overlap then use `zip(data[::2],data[1::2])` – Duncan Jan 18 '13 at 15:14
how can i send that to a output .txt the result of the map(''.join, zip(data, data[1:])) – PythonNewbie Jan 18 '13 at 16:10
when i save that map .... to a variable a for example and give a parse.py > parsed.txt i get : ['fo', 'oo', 'ob', 'ba', 'ar'] isntead of fo oo ob ba ar – PythonNewbie Jan 18 '13 at 16:22
hm ok thanks alot can you just explain why the izip receives the index of the return value [1:] why not from 1:2 or something is that a default ? – PythonNewbie Jan 18 '13 at 17:49
@PythonNewbie: The notation used is called `slice notation`. To learn more about it refer the [SO Post](http://stackoverflow.com/questions/509211/the-python-slice-notation). In case this answer helped you, don;t forget to accept it. – Abhijit Jan 18 '13 at 17:53

Parse String every 2 letters

2 Answers2