1

I have a dataframe:

a       b
jon   [(age,12), (gender,1), (marital,1)]
sam   [(age,34), (gender,1), (marital,2)]
el    [(age,14), (gender,2), (marital,1)]

I want to split the b column into 3 different columns, such that i get

a       b1         b2            b3
jon   (age,12)  (gender,1) (marital,1)
sam   (age,34)  (gender,1) (marital,2)
el    (age,14)  (gender,2) (marital,1)

Is there a pythonic way to do this efficiently?

Shubham R
  • 7,382
  • 18
  • 53
  • 119
  • Does this answer your question? [Splitting dictionary/list inside a Pandas Column into Separate Columns](https://stackoverflow.com/questions/38231591/splitting-dictionary-list-inside-a-pandas-column-into-separate-columns) – noah Nov 25 '20 at 20:16
  • Does this answer your question? [Pandas: split column of lists of unequal length into multiple columns](https://stackoverflow.com/questions/44663903/pandas-split-column-of-lists-of-unequal-length-into-multiple-columns) – Joe Ferndz Nov 25 '20 at 20:29

2 Answers2

1

Another (similar) approach that works regardless of the list lengths, update the dataframe, and also handles the column names automatically in one-step:

col = 'b' # your target column
df.join(pd.DataFrame(df[col].tolist()).rename(columns=lambda x: col+str(x+1))).drop(col, 1)

Output:

     a         b1           b2            b3
0  jon  (age, 12)  (gender, 1)  (marital, 1)
1  san  (age, 34)  (gender, 1)  (marital, 2)
2   el  (age, 14)  (gender, 2)  (marital, 1)

To make it scalable to other column as well, wrap it into a function like this:

def split_columns(df, cols):
    for col in cols:
        df = df.join(pd.DataFrame(df[col].tolist()).rename(columns=lambda x: col+str(x+1))).drop(col, 1)
    return df

Example:

# Original tested data
df = pd.DataFrame({
    'a': ['jon','san','el'],
    'b': [[('age',12), ('gender',1), ('marital',1)],
          [('age',34), ('gender',1), ('marital',2)],
          [('age',14), ('gender',2), ('marital',1)]]
})

# Add further info for testing
df['c'] = df['b'] # create another column for testing
df.iloc[-1, -1] = [('age', 14)] # test with a list of length = 1
print(df.to_markdown())

|    | a   | b                                            | c                                            |
|---:|:----|:---------------------------------------------|:---------------------------------------------|
|  0 | jon | [('age', 12), ('gender', 1), ('marital', 1)] | [('age', 12), ('gender', 1), ('marital', 1)] |
|  1 | san | [('age', 34), ('gender', 1), ('marital', 2)] | [('age', 34), ('gender', 1), ('marital', 2)] |
|  2 | el  | [('age', 14), ('gender', 2), ('marital', 1)] | [('age', 14)]                                |

then, a calling like split_columns(df, ['b','c']) returns:

|    | a   | b1          | b2            | b3             | c1          | c2            | c3             |
|---:|:----|:------------|:--------------|:---------------|:------------|:--------------|:---------------|
|  0 | jon | ('age', 12) | ('gender', 1) | ('marital', 1) | ('age', 12) | ('gender', 1) | ('marital', 1) |
|  1 | san | ('age', 34) | ('gender', 1) | ('marital', 2) | ('age', 34) | ('gender', 1) | ('marital', 2) |
|  2 | el  | ('age', 14) | ('gender', 2) | ('marital', 1) | ('age', 14) |               |                |
Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35
0

You can give something as simple as this.

Assuming your list has only 3 tuples, you can give:

df[['b1','b2','b3']] = pd.DataFrame(df.b.values.tolist())


>>> df = pd.DataFrame([['jon',[('age',12), ('gender',1), ('marital',1)]]],columns = ['a','b'])
>>> df
     a                                       b
0  jon  [(age, 12), (gender, 1), (marital, 1)]
>>> df[['b1','b2','b3']] = pd.DataFrame(df.b.values.tolist())
>>> df
     a                                       b  ...           b2            b3
0  jon  [(age, 12), (gender, 1), (marital, 1)]  ...  (gender, 1)  (marital, 1)

[1 rows x 5 columns]
>>> df[['a','b1','b2','b3']]
     a         b1           b2            b3
0  jon  (age, 12)  (gender, 1)  (marital, 1)
>>> 
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33