0

I have a string which has characters from multiple languages:

'죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요'

I am trying to chunk this single string into a list of strings based on the number of words in the string and the result should be this if the chunk size is 7 i.e. there are at maximum 7 words in a string:

['죄송합니다 how are you doing? My name', 'is Yudhiesh and I am 아니 doing', 'good 저기요']

My current attempt which is based on how you would chunk a list which is not working:

s = '죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요'
>>> parts = [str(s[i:i+7]) for i in range(0, len(s), 7)]
>>> parts
['죄송합니다 h', 'ow are ', 'you doi', 'ng? My ', 'name is', ' Yudhie', 'sh and ', 'I am 아니', ' doing ', 'good 저기', '요']
yudhiesh
  • 6,383
  • 3
  • 16
  • 49

5 Answers5

1

First, you can create a list of words, and then, create chunks and join them.

Here is what you need in a function:

def split_max_num(string, max_words):
    """
    >>> split_max_num('죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요', 7)
    ['죄송합니다 how are you doing? My name', 'is Yudhiesh and I am 아니 doing', 'good 저기요']
    """
    words = string.split()
    len_words = len(words)

    res = list()
    for index in range(0, len_words, max_words):
        res.append(' '.join(words[index:index+max_words]))
    return res
Dorian Turba
  • 3,260
  • 3
  • 23
  • 67
1

How about the following ?

def split_max(words, n): 
    words = words.split()
    words = [words[i:i + n] for i in range(0, len(words), n)]
    return [' '.join(l) for l in words]


split_max(data, 7)
Ricardo Alvaro Lohmann
  • 26,031
  • 7
  • 82
  • 82
0

You're converting a list into a string representation of a list.

I think you meant to rejoin the words

s = '죄송합니다 how are you doing? My name is Yudhiesh and I am 아니 doing good 저기요'
s = s.split()
print([" ".join(s[i:i+7]) for i in range(0, len(s), 7)]) 
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
0

Use .split() on the string and chunk it:

from typing import List

def chunk_list(lst: List[str], chunk_size: int) -> List[List[str]]: 
    return [
        lst[i:i + chunk_size] 
        for i in range(0, len(lst), chunk_size)
    ]
     

def chunk_string(string: str, chunk_size: int) -> List[str]:
    return chunk_list(string.split(), chunk_size)
yudhiesh
  • 6,383
  • 3
  • 16
  • 49
Jonathan Scholbach
  • 4,925
  • 3
  • 23
  • 44
  • 1
    Thanks for the answer but it produces `[['죄송합니다', 'how', 'are', 'you', 'doing?', 'My', 'name'], ['is', 'Yudhiesh', 'and', 'I', 'am', '아니', 'doing'], ['good', '저기요']]` – yudhiesh Feb 11 '21 at 15:34
0

Are you sure you're not setting s=['죄송합니다 how are you doing? My name', 'is 도와 주세요 and I am doing', 'good 저기요'] before producing parts in this example ?

You may want to split your original string with "somestring".split(" ") i.e split on spaces to get a list of all words, then you can chop the list with indexing like you've tried to do.

  • you've tried: ` somestring = "yoursentence" wordlist = somestring.split(" ") sevens = [wordlist[i:i+7].join() for i in range(0, len(wordlist), 7)] ` and it doesn't work? – bibblybobbly Feb 11 '21 at 15:42