split python string every nth character (string and nth character are in lists)

Question

I need something like this for my code: Split python string every nth character?

In my case however; n are numbers within nested lists, and the strings I want to split are also within nested lists.

myList = [["'hello''my name'"],["'is Michael'"],["'and'", "'I like''apples'"]]

nList = [[7,9],[12],[5,8,8]

I want to get something like this:

myNewList = [["'hello'","'my name'"],["'is Michael'"],["'and'", "'I like'","'apples"]]

i.e I want to split the string by lengths corresponding to the numbers in nList.

I tried using a similar solution to the link I posted above:

My attempt:

myNewList = [myList[sum(nList[:i]):sum(nList[:i+1])] for i in range(len(nList))]

but it doesn't really match my case.

EDIT:

Note, I do not want to use split after each quote, however it is acceptable to offer it as a solution. The numbers vary and this is a simplified scenario that I am using to allude to my situation with XML data handling/writing.

can you please explain the meaning of the values in `nList` ? Are they correct for the given example? — Pynchia, Feb 04 '16 at 22:56
What is the higher purpose behind this? Since all of the desired phrases are already delimited by single quotes, I don't see the purpose in having the lengths in another list. Just use **split("''")** on the original entries. — Prune, Feb 04 '16 at 22:58
@Pynchia The values for nList are correct for the example. I double checked. — Mike Issa, Feb 04 '16 at 23:02
@Prune This is simplified scenario, of course. But if you must know, I am extracting text from elements in an XML document and appending them into nested lists (each nest represents each 'step', or block in the XML). The text must be separated by a certain number of characters, and these numbers are extracted from another element in the same XML block (therefore the numbers are nested as well). — Mike Issa, Feb 04 '16 at 23:04
@Prune splitting on `''` will give you elements alternating `'` as a prefix and suffix on successive elements. A regex may well be a better solution instead of the list of lenghts — Pynchia, Feb 04 '16 at 23:05
I need a clarification. The last element of **myList** has two elements instead of one. However, the last element of nList has a simple sequence of three integers. Is the structure not directly applicable? — Prune, Feb 04 '16 at 23:16
exactly, the elements in `myList` are not homogeneous/coherent. — Pynchia, Feb 04 '16 at 23:21
@Prune Yes, that's correct. The structure is not directly applicable. — Mike Issa, Feb 04 '16 at 23:21
I'd suggest using lxml `import lxml` for extracting elements from xml in a clean way — Ramast, Feb 04 '16 at 23:23
@MikeIssa I have a solution in case the elements in `myList` are lists containing a single string, as in the first two elements — Pynchia, Feb 04 '16 at 23:26
@Ramast Please try and offer a solution to the problem posted, not my situation at hand. — Mike Issa, Feb 04 '16 at 23:26

score 0 · Answer 1 · answered Feb 04 '16 at 23:27

I have a solution for the case where the structures are compatible. Part of your original problem was a missing subscript: each element of mlList is a sub-list that contains a list of strings. I've concatenated the final list and inserted the [0] subscript, now redundant.

Is this close enough to get you moving? If not, I can add the necessary ''.join to finish the job, but it's even uglier than this.

I, too, recommend that you employ an xml parsing tool and regular expressions. This has been a lovely exercise, but it's not particularly maintainable.

myList = [["'hello''my name'"], ["'is Michael'"], ["'and''I like''apples'"]]
nList = [[7, 9], [12], [5, 8, 8]]
myNewList = [[myList[phrase][0][sum(nList[phrase][:spl]):sum(nList[phrase][:spl+1])]
              for spl in range(len(nList[phrase]))]
              for phrase in range(len(myList))]

print myNewList

Never mind; it was a trivial addition to my attempt above:

myList = [["'hello''my name'"], ["'is Michael'"], ["'and'", "'I like''apples'"]]
nList = [[7, 9], [12], [5, 8, 8]]
myNewList = [[''.join(myList[phrase])[sum(nList[phrase][:spl]):sum(nList[phrase][:spl+1])]
              for spl in range(len(nList[phrase]))]
              for phrase in range(len(myList))]

print myNewList

Output:

[["'hello'", "'my name'"], ["'is Michael'"], ["'and'", "'I like'", "'apples'"]]

score 0 · Answer 2 · answered Feb 04 '16 at 23:45

0

res = []
for word, nums in zip(myList, nList):
    row = []
    curr = 0
    for offset in nums:
        row.append(word[0][curr:curr+offset])
        curr += offset
    res.append(row)

print(res)

Untested though.

answered Feb 04 '16 at 23:45

SoreDakeNoKoto

1,175
1
9
16

split python string every nth character (string and nth character are in lists)

2 Answers2