From tuples to multiple columns in pandas

Question

How do I convert this dataframe

                                          location  value                       
0                   (Richmond, Virginia, nan, USA)    100                       
1              (New York City, New York, nan, USA)    200

to this:

    city            state       region    country   value
0   Richmond        Virginia    nan       USA       100
1   New York City   New York    nan       USA       200

Note that the location column in the first dataframe contains tuples. I want to create four columns out of the location column.

exp1orer · Accepted Answer · 2014-08-29T05:08:24.720

15

new_col_list = ['city','state','regions','country']
for n,col in enumerate(new_col_list):
    df[col] = df['location'].apply(lambda location: location[n])

df = df.drop('location',axis=1)

edited Aug 29 '14 at 05:08

answered Aug 28 '14 at 23:27

exp1orer

11,481
7
38
51

meloncholy · Answer 2 · 2014-08-29T06:42:06.440

7

If you return a Series of the (split) location, you can merge (join to merge on index) the resulting DF directly with your value column.

addr = ['city', 'state', 'region', 'country']
df[['value']].join(df.location.apply(lambda loc: Series(loc, index=addr)))

   value           city     state  region country
0    100       Richmond  Virginia     NaN     USA
1    200  New York City  New York     NaN     USA

edited Aug 29 '14 at 06:42

answered Aug 28 '14 at 23:39

meloncholy

2,122
18
16

Thanks. @exp1orer's solution worked, so I didn't have to try yours. – ba_ul Aug 29 '14 at 00:23
That is nicely done! If I'm not mistaken, it only works if `df` has index that is `range(len(df))`, right? – exp1orer Aug 29 '14 at 05:10
@exp1orer Not quite. `join` is shorthand for merging on index with both frames, so the indices need only be consistent (which it will be here as the apply and col selection don't affect it). I'll edit the answer. – meloncholy Aug 29 '14 at 06:37
6

Should have thought about this earlier, but worth noting that this is **much** slower than @exp1orer's solution on a largish frame due to all of the Series being created. – meloncholy Aug 29 '14 at 07:11

score 3 · Answer 3 · answered Dec 04 '19 at 19:31

I haven't timed this, but I would suggest this option:

df.loc[:,'city']=df.location.map(lambda x:x[0])
df.loc[:,'state']=df.location.map(lambda x:x[1])
df.loc[:,'regions']=df.location.map(lambda x:x[2])
df.loc[:,'country']=df.location.map(lambda x:x[3])

I'm guessing avoiding explicit for loop might lend itself to a SIMD instruction (certainly numpy looks for that, but perhaps not other libraries)

score 0 · Answer 4 · answered Jun 09 '22 at 16:12

I prefer to use pd.DataFrame.from_records to convert the tuples to Series. Then this can be joined to the previous dataset as described by meloncholy.

df = pd.DataFrame({"location":[("Richmond", "Virginia", pd.NA, "USA"),
                               ("New York City", "New York", pd.NA, "USA")],
                   "value": [100,200]})

loc = pd.DataFrame.from_records(df.location, columns=['city','state','regions','country'])
df.drop("location", axis=1).join(loc)

from_records does assume a sequential index. If this is not the case you should pass the index to the new DataFrame:

loc = pd.DataFrame.from_records(df.location.reset_index(drop=True),
                                columns=['city','state','regions','country'],
                                index=df.index)

From tuples to multiple columns in pandas

4 Answers4

Linked