-3

Suppose I am having a list of lists (containing tokens of sentences).

For example:

new_list = ['hello', 'folks', 'i', 'am', 'a', 'good', 'boy', '.'], ['python', 'is', 'a', 'language', '.']]

I want to merge them back into a single list?

How to achieve this? Any shortcut for this?

Output:

['hello folks i am a good boy.', 'python is a language'.]

What I have tried is as follows:

1) new_list_1 = (''.join(str(new_list)))

2) from itertools import chain
   new_list_1 = list(chain(*new_list))

At present I am getting the output (in terms of merged tokens only) as:

new_list_1 = ['hello', 'folks', 'i', 'am', 'a', 'good', 'boy', '.' 'python', 'is', 'a', 'language', '.']
M S
  • 894
  • 1
  • 13
  • 41

3 Answers3

1

Try this :

new_list = [['hello', 'folks', 'i', 'am', 'a', 'good', 'boy', '.'], ['python', 'is', 'a', 'language', '.']]
new_list = [' '.join(i) for i in new_list]

Output :

['hello folks i am a good boy .', 'python is a language .']

If you want to add the last item without any space before it, try this :

new_list = [' '.join(i[:-1])+i[-1] for i in new_list]

Output :

['hello folks i am a good boy.', 'python is a language.']

Notice, in this case, there is no added space in both strings just before .

Arkistarvh Kltzuonstev
  • 6,824
  • 7
  • 26
  • 56
1

Your first approach converts the entire list to a string

In [7]: ''.join(str(new_list))                                                                                                  
Out[7]: "[['hello', 'folks', 'i', 'am', 'a', 'good', 'boy', '.'], ['python', 'is', 'a', 'language', '.']]"

Whereas your second approach flattens your list

In [10]: new_list_1 = list(chain(*new_list))                                                                                    

In [11]: new_list_1                                                                                                             
Out[11]: 
['hello',
 'folks',
 'i',
 'am',
 'a',
 'good',
 'boy',
 '.',
 'python',
 'is',
 'a',
 'language',
 '.']

Also the last element . should ideally be part of the word before, since a punctuation mark is not a token, so your list should look like

new_list = [['hello', 'folks', 'i', 'am', 'a', 'good', 'boy.'], ['python', 'is', 'a', 'language.']]

Instead, you want to iterate over the list and apply str.join on the sublists

In [13]: [ ' '.join(item) for item in new_list]                                                                                 
Out[13]: ['hello folks i am a good boy.', 'python is a language.']

You can also use map to apply the str.join on the items of the list

In [14]: list(map(' '.join, new_list))                                                                                          
Out[14]: ['hello folks i am a good boy.', 'python is a language.
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40
1

Try list comprehension

new_list = [['hello', 'folks', 'i', 'am', 'a', 'good', 'boy', '.'], ['python', 'is', 'a', 'language', '.']]

res_list = [' '.join(x) for x in new_list]  # this line will do your work

print(res_list)

result : ['hello folks i am a good boy .', 'python is a language .']

rohit prakash
  • 565
  • 6
  • 12