0

I have a list of statements. Each statement has a number of other statements separated by commas.

gf = ['citrus fruit,black bread,margarine,ready soups', 'tropical 
fruit,yogurt,coffee, margarine', 'whole milk'] ## it has elements (lists with multiple items separated by EXTERNAL commas). 

I need to transform it into:

gf_n = ['citrus fruit', 'black bread', 'margarine', 'ready soups', 'tropical fruit', 'yogurt', 'coffee', 'margarine','whole milk']

The individual elements (words or a group of words) can be repeated. In the future, I will need to calculate the frequency of every element (e.g. "citrus fruit") and the frequency of every two-item combination (e.g. 'black bread' and 'margarine')

Here is my code and the result is not what I need:

gf_list = list(gf.split(","))

gf_item = []

gf_item = [item for sublist in gf_list for item in sublist]

this is what I get surprisingly (letters - not words)

['c', 'i', 't', 'r', 'u', 's', ' ', 'f', 'r', 'u'] # first 10 elements

What do I do wrong?

SOLUTION (after some time I came up with this):

for subl in lst:

    gf_item.append(subl.split(","))
Toly
  • 2,981
  • 8
  • 25
  • 35
  • 1
    Please add a tag specifying the language. – punund Sep 06 '21 at 21:51
  • Parse and flatten the list (`gf_n = sum((s.split(',') for s in gf), [])`) and if leading and trailing spaces should _not_ be preserved, you need to trim them (`[s.strip() for s in gf_n]`). – Andrej Podzimek Sep 07 '21 at 00:58

2 Answers2

1
 gf_item = [i.strip() for i in gf for i in i.split(',')]

Using a generator-

def flatten(x):
    if type(x) is str:
        for i in x.split(','):
            yield i.strip()
        return 
    try:
        for i in x:
            yield from flatten(i)
    except TypeError:
        yield x 

gf = ['citrus fruit,black bread,margarine,ready soups', 'tropical fruit,yogurt,coffee, margarine', 'whole milk']
gf_new = list(flatten(gf))

lst = [['alice', 'gun'], ['bob', 'tree', ' mot'],['cara']]
lst_new = list(flatten(lst))

print(gf_new)
print(lst_new)

output:

['citrus fruit', 'black bread', 'margarine', 'ready soups', 'tropical fruit', 'yogurt', 'coffee', 'margarine', 'whole milk']
['alice', 'gun', 'bob', 'tree', 'mot', 'cara']
Nothing special
  • 415
  • 3
  • 8
  • This produces something different: ['alice', 'gun', 'bob', 'tree', 'mot', 'cara']. I need: [['alice', 'gun'], ['bob', 'tree', ' mot'], ['cara']]. I only changes elements for brevity – Toly Sep 07 '21 at 21:16
  • U didn't mention that u may have nested list in your question. See the example u had given contains a flat list of strings. – Nothing special Sep 08 '21 at 01:15
  • I've edited the answer. It will work now. – Nothing special Sep 08 '21 at 01:32
0

try this:

gf1 = str(gf).replace("[",'').replace("]",'').replace("'",'')
gf_n = [x.strip() for x in gf1.split(',')]

which returns:

['citrus fruit',
 'black bread',
 'margarine',
 'ready soups',
 'tropical fruit',
 'yogurt',
 'coffee',
 'margarine',
 'whole milk']
Dharman
  • 30,962
  • 25
  • 85
  • 135
FCastell
  • 87
  • 3
  • 9