-3

I need something like this for my code: Split python string every nth character?

In my case however; n are numbers within nested lists, and the strings I want to split are also within nested lists.

myList = [["'hello''my name'"],["'is Michael'"],["'and'", "'I like''apples'"]]

nList = [[7,9],[12],[5,8,8]

I want to get something like this:

myNewList = [["'hello'","'my name'"],["'is Michael'"],["'and'", "'I like'","'apples"]]

i.e I want to split the string by lengths corresponding to the numbers in nList.

I tried using a similar solution to the link I posted above:

My attempt:

myNewList = [myList[sum(nList[:i]):sum(nList[:i+1])] for i in range(len(nList))]

but it doesn't really match my case.

EDIT:

Note, I do not want to use split after each quote, however it is acceptable to offer it as a solution. The numbers vary and this is a simplified scenario that I am using to allude to my situation with XML data handling/writing.

Community
  • 1
  • 1
Mike Issa
  • 295
  • 2
  • 13
  • 1
    can you please explain the meaning of the values in `nList` ? Are they correct for the given example? – Pynchia Feb 04 '16 at 22:56
  • Lengths of the pieces to be grabbed. – Prune Feb 04 '16 at 22:57
  • 3
    What is the higher purpose behind this? Since all of the desired phrases are already delimited by single quotes, I don't see the purpose in having the lengths in another list. Just use **split("''")** on the original entries. – Prune Feb 04 '16 at 22:58
  • @Pynchia The values for nList are correct for the example. I double checked. – Mike Issa Feb 04 '16 at 23:02
  • 1
    @Prune This is simplified scenario, of course. But if you must know, I am extracting text from elements in an XML document and appending them into nested lists (each nest represents each 'step', or block in the XML). The text must be separated by a certain number of characters, and these numbers are extracted from another element in the same XML block (therefore the numbers are nested as well). – Mike Issa Feb 04 '16 at 23:04
  • 2
    @Prune splitting on `''` will give you elements alternating `'` as a prefix and suffix on successive elements. A regex may well be a better solution instead of the list of lenghts – Pynchia Feb 04 '16 at 23:05
  • 1
    I need a clarification. The last element of **myList** has two elements instead of one. However, the last element of nList has a simple sequence of three integers. Is the structure not directly applicable? – Prune Feb 04 '16 at 23:16
  • 1
    exactly, the elements in `myList` are not homogeneous/coherent. – Pynchia Feb 04 '16 at 23:21
  • @Prune Yes, that's correct. The structure is not directly applicable. – Mike Issa Feb 04 '16 at 23:21
  • 1
    I'd suggest using lxml `import lxml` for extracting elements from xml in a clean way – Ramast Feb 04 '16 at 23:23
  • 1
    @MikeIssa I have a solution in case the elements in `myList` are lists containing a single string, as in the first two elements – Pynchia Feb 04 '16 at 23:26
  • @Ramast Please try and offer a solution to the problem posted, not my situation at hand. – Mike Issa Feb 04 '16 at 23:26

2 Answers2

0

I have a solution for the case where the structures are compatible. Part of your original problem was a missing subscript: each element of mlList is a sub-list that contains a list of strings. I've concatenated the final list and inserted the [0] subscript, now redundant.

Is this close enough to get you moving? If not, I can add the necessary ''.join to finish the job, but it's even uglier than this.

I, too, recommend that you employ an xml parsing tool and regular expressions. This has been a lovely exercise, but it's not particularly maintainable.

myList = [["'hello''my name'"], ["'is Michael'"], ["'and''I like''apples'"]]
nList = [[7, 9], [12], [5, 8, 8]]
myNewList = [[myList[phrase][0][sum(nList[phrase][:spl]):sum(nList[phrase][:spl+1])]
              for spl in range(len(nList[phrase]))]
              for phrase in range(len(myList))]

print myNewList

Never mind; it was a trivial addition to my attempt above:

myList = [["'hello''my name'"], ["'is Michael'"], ["'and'", "'I like''apples'"]]
nList = [[7, 9], [12], [5, 8, 8]]
myNewList = [[''.join(myList[phrase])[sum(nList[phrase][:spl]):sum(nList[phrase][:spl+1])]
              for spl in range(len(nList[phrase]))]
              for phrase in range(len(myList))]

print myNewList

Output:

[["'hello'", "'my name'"], ["'is Michael'"], ["'and'", "'I like'", "'apples'"]]
Prune
  • 76,765
  • 14
  • 60
  • 81
0
res = []
for word, nums in zip(myList, nList):
    row = []
    curr = 0
    for offset in nums:
        row.append(word[0][curr:curr+offset])
        curr += offset
    res.append(row)

print(res)

Untested though.

SoreDakeNoKoto
  • 1,175
  • 1
  • 9
  • 16