2
df = pd.DataFrame({'a':[None,1, 2], 'b':[None, (1,2), (3,4)]}) 


    a   b
0   NaN None
1   1.0 (1, 2)
2   2.0 (3, 4)

I want to set the tuple in the column be to each have their own column. However, I have an issue with my approach

df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)

ValueError: Columns must be same length as key

I tried to fillna will an empty tuple, but it won't take a tuple. How can I make this work?

Ric S
  • 9,073
  • 3
  • 25
  • 51
David 54321
  • 568
  • 1
  • 9
  • 23

4 Answers4

1

You can first drop the NaN values in column b then create a new dataframe from the remaining elements in column b and assign the resulting dataframe to the columns b1 and b2:

b = df['b'].dropna()
df[['b1', 'b2']] = pd.DataFrame(b.tolist(), index=b.index)

>>> df

     a       b   b1   b2
0  NaN    None  NaN  NaN
1  1.0  (1, 2)  1.0  2.0
2  2.0  (3, 4)  3.0  4.0
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
1

To my surprise, this solution by piR² works in your case as well:

df["x"], df["y"] = df.b.str

Output:

     a       b    x    y
0  NaN    None  NaN  NaN
1  1.0  (1, 2)  1.0  2.0
2  2.0  (3, 4)  3.0  4.0

Having said this - there is a FutureWarning Columnar iteration over characters will be deprecated in future releases., so this is not a long-term solution.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
0

Transform None to (None, None) as follows before creating the 2 columns:

df['b'] = df['b'].map(lambda x: (None, None) if x is None else x)

Then you can get the desired result with your step:

    df[['b1', 'b2']] = pd.DataFrame(df['b'].tolist(), index=df.index)
    print(df)

Output:
    a              b     b1  b2
0   NaN (None, None)    NaN NaN
1   1.0       (1, 2)    1.0 2.0
2   2.0       (3, 4)    3.0 4.0

If you want the None in column b untouched, you can use:

    df[['b1', 'b2']] = pd.DataFrame(df['b'].map(lambda x: (None, None) if x is None else x).tolist(), index=df.index)

    print(df)

Output:
    a         b    b1  b2
0   NaN    None   NaN NaN
1   1.0  (1, 2)   1.0 2.0
2   2.0  (3, 4)   3.0 4.0
SeaBean
  • 22,547
  • 3
  • 13
  • 25
0

A more generalized solution if you have tuples with different number of elements would be to create a custom function like the following

def create_columns_from_tuple(df, tuple_col):
    
    # get max length of tuples
    max_len = df[tuple_col].apply(lambda x: 0 if x is None else len(x)).max()
    
    # select rows with non-empty tuples
    df_full = df.loc[df[tuple_col].notna()]
    
    # create dataframe with exploded tuples
    df_full_exploded = pd.DataFrame(df_full[tuple_col].tolist(),
                                    index=df_full.index, 
                                    columns=[tuple_col + str(n) for n in range(1, max_len+1)])
    
    # merge the two dataframes by index
    result = df.merge(df_full_exploded, left_index=True, right_index=True, how='left')
    
    return result

In this function you pass your dataframe and the name of the tuple column. The function will automatically create as many columns as the maximum length of your tuples.

create_columns_from_tuple(df, tuple_col='b')
#      a       b   b1   b2
# 0  NaN    None  NaN  NaN
# 1  1.0  (1, 2)  1.0  2.0
# 2  2.0  (3, 4)  3.0  4.0

If you have tuples with different number of elements:

df = pd.DataFrame({'a':[None,1, 2], 'b':[None, (1,2,42), (3,4)]}) 
create_columns_from_tuple(df, tuple_col='b')
#      a           b   b1   b2    b3
# 0  NaN        None  NaN  NaN   NaN
# 1  1.0  (1, 2, 42)  1.0  2.0  42.0
# 2  2.0      (3, 4)  3.0  4.0   NaN
Ric S
  • 9,073
  • 3
  • 25
  • 51