Split strings in tuples into columns, in Pandas

Question

I have the following DataFrame, where Track ID is the row index. How can I split the string in the stats column into 5 columns of numbers?

Track ID    stats
14.0    (-0.00924175824176, 0.41, -0.742016492568, 0.0036830094242, 0.00251748449963)
28.0    (0.0411538461538, 0.318230769231, 0.758717081514, 0.00264000622468, 0.0106535783677)
42.0    (-0.0144351648352, 0.168438461538, -0.80870348637, 0.000816872566404, 0.00316572586742)
56.0    (0.0343461538462, 0.288730769231, 0.950844962874, 6.1608706775e-07, 0.00337262030771)
70.0    (0.00905164835165, 0.151030769231, 0.670257006716, 0.0121790506745, 0.00302182567957)
84.0    (-0.0047967032967, 0.171615384615, -0.552879463981, 0.0500316517755, 0.00217970256969)

You will make things a lot easier for others by posting df.to_dict() of the dataframe instead of its string representation. — chthonicdaemon, Mar 31 '15 at 14:03
The stats column, does it contain a string that looks like a tuple, or does it contain tuples? — joris, Mar 31 '15 at 14:06
possible duplicate of [Convert a pandas "Series of pair arrays" to a "two-column DataFrame"?](http://stackoverflow.com/questions/29346512/convert-a-pandas-series-of-pair-arrays-to-a-two-column-dataframe) — TheBlackCat, Mar 31 '15 at 14:11
Thank you for the input - I'm very much at the bottom of a steep learning curve. I wasn't aware of `df.to_dict()`, but it showed me that the `stats` column in my data are indeed tuples, and not strings. For example, using `[float(x[0]) for x in df['stats']]` gave me the 0th elements. — t_n, Apr 02 '15 at 09:31

joris · Accepted Answer · 2019-02-07T20:00:32.193

59

And for the other case, assuming it are strings that look like tuples:

In [74]: df['stats'].str[1:-1].str.split(',', expand=True).astype(float)
Out[74]:
          0         1         2         3         4
0 -0.009242  0.410000 -0.742016  0.003683  0.002517
1  0.041154  0.318231  0.758717  0.002640  0.010654
2 -0.014435  0.168438 -0.808703  0.000817  0.003166
3  0.034346  0.288731  0.950845  0.000001  0.003373
4  0.009052  0.151031  0.670257  0.012179  0.003022
5 -0.004797  0.171615 -0.552879  0.050032  0.002180

(note: for older versions of pandas (< 0.16.1), you need to use return_type='frame' instead of the expand keyword)

By the way, if it are tuples and not strings, you can simply do the following:

pd.DataFrame(df['stats'].tolist(), index=df.index)

edited Feb 07 '19 at 20:00

answered Mar 31 '15 at 14:13

joris

133,120
36
247
202

Thanks for the input. I'm new to pandas, python, and scripting in general, so am still getting my head around the basics. The data in the `stats` column were indeed tuples. Creating a new DataFrame with `df2 = df['stats'].apply(pd.Series)` has moved me forwards greatly. Thanks again. – t_n Apr 02 '15 at 09:38
6

`return_type` has been deprecated in favor of `expand=True`. – tacaswell May 29 '16 at 19:17
Please don't use `apply(pd.Series)` (or recommend its usage). See [When should I ever use pandas apply() in my code?](https://stackoverflow.com/a/54432584/4909087) A better solution is to listify the column first. – cs95 Jan 30 '19 at 06:41

score 2 · Answer 2 · answered Mar 31 '15 at 14:24

If you have a sequence of tuples and not strings, and you want them as DataFrame columns, this is the simplest approach:

df = pd.concat([df['Track ID'],pd.DataFrame(df['stats'].values.tolist())], axis=1)

If it is actually strings, you can first convert it to lists like so, then apply the above operation:

dfpart = pd.DataFrame(df['stats'].apply(lambda x: x.strip('()').split(', ')).values.tolist()).astype(float)
df = pd.concat([df['Track ID'], dfpart], axis=1)

score 0 · Answer 3 · answered Mar 31 '15 at 14:10

Assuming you have a column which contains tuples (as it appears in your example) rather than strings, this will work:

df = pandas.DataFrame({'Track ID': [14, 28, 42], 'stats': [(1, 2, 3, 4, 5), (1, 2, 3, 4, 5), (1, 2, 3, 4, 5)]}).set_index("Track ID")

from operator import itemgetter
for i in range(5):
    df["Col {}".format(i)] = df.stats.apply(itemgetter(i))

If you actually have strings that look like tuples, you can parse them first and then apply the same pattern as above:

df = df2 = pandas.DataFrame({'Track ID': [14, 28, 42], 'stats': ["(1, 2, 3, 4, 5)", "(1, 2, 3, 4, 5)", "(1, 2, 3, 4, 5)"]}).set_index("Track ID")
df.stats = df2.stats.str.strip("()").str.split(", ")

G-Higgins · Answer 4 · 2021-01-25T23:16:21.070

0

Since your stats column contains tuples of length 5, this does the job fairly succinctly

pd.concat([df['Track ID'], pd.DataFrame(df['stats'].tolist(), index = 
df.index, columns = [f'stats_{i}' for i in range(1, 6)])], axis = 1)

edited Jan 25 '21 at 23:16

answered Jan 25 '21 at 23:01

G-Higgins

161
1
3

Split strings in tuples into columns, in Pandas

4 Answers4

Linked

Related