-1

Is there a simple way to break down this string into multiple lists in Python so that I can then create a dataframe with those lists?

1|Mirazur|Menton, France|2|Noma|Copenhagen, Denmark|3|Asador Etxebarri|Axpe, Spain|4|Gaggan|Bangkok, Thailand|5|Geranium|Copenhagen, Denmark|6|Central|Lima, Peru|7|Mugaritz|San Sebastián, Spain|8|Arpège|Paris, France|9|Disfrutar|Barcelona, Spain|10|Maido|Lima, Peru|11|Den|Tokyo, Japan

I want to break it down so that it looks like:

[1, Mirazur, Menton, France]
[2, Noma, Copenhagen, Denmark]
and so on so forth.

I'm really new to all this, so any advice really appreciated. The more simple answer is possible, rather than any 'fancier' ones would be great so that I can understand the more basic concepts first!

Vajiha
  • 1
  • 1

1 Answers1

0

Piece of cake. The basis is splitting on the | character; this will give you a flat list of all items. Next, split the list into smaller ones of a fixed size; a well-researched question with lots of answers. I chose https://stackoverflow.com/a/5711993/2564301 because it does not use any external libraries and returns a useful base for the next step:

print (zip(*[data.split('|')[i::3] for i in range(3)]))

This returns a zip type, as can be seen with

for item in zip(*[data.split('|')[i::3] for i in range(3)]):
    print (item)

which comes pretty close:

('1', 'Mirazur', 'Menton, France')
('2', 'Noma', 'Copenhagen, Denmark')
('3', 'Asador Etxebarri', 'Axpe, Spain')
etc.

(If you are wondering why zip is needed, print the result of [data.split('|')[i::3] for i in range(3)].)

The final step is to convert each tuple into a list of its own.

Putting it together:

import pprint

data = '1|Mirazur|Menton, France|2|Noma|Copenhagen, Denmark|3|Asador Etxebarri|Axpe, Spain|4|Gaggan|Bangkok, Thailand|5|Geranium|Copenhagen, Denmark|6|Central|Lima, Peru|7|Mugaritz|San Sebastián, Spain|8|Arpège|Paris, France|9|Disfrutar|Barcelona, Spain|10|Maido|Lima, Peru|11|Den|Tokyo, Japan'

data = [list(item) for item in zip(*[data.split('|')[i::3] for i in range(3)])]
pprint.pprint (data)

Result (nice indentation courtesy of pprint):

[['1', 'Mirazur', 'Menton, France'],
 ['2', 'Noma', 'Copenhagen, Denmark'],
 ['3', 'Asador Etxebarri', 'Axpe, Spain'],
 ['4', 'Gaggan', 'Bangkok, Thailand'],
 ['5', 'Geranium', 'Copenhagen, Denmark'],
 ['6', 'Central', 'Lima, Peru'],
 ['7', 'Mugaritz', 'San Sebastián, Spain'],
 ['8', 'Arpège', 'Paris, France'],
 ['9', 'Disfrutar', 'Barcelona, Spain'],
 ['10', 'Maido', 'Lima, Peru'],
 ['11', 'Den', 'Tokyo, Japan']]
Jongware
  • 22,200
  • 8
  • 54
  • 100
  • Thank you SO much. I got as far as the splitting on | but then tried all sorts after and just couldn't get the separate rows. This makes so much sense now that I read it. Thanks a bunch again. – Vajiha Jan 25 '20 at 11:02