Pandas: How to split a column of string of multiple tuples to multiple columns of individual string of tuple

Question

I need advice on how to proceed when slicing a string with an explanation.

I have in dataframe column:

data
(0,1), (1,2)

And I would like to divide it into this form.

1	2
(0,1)	(1,2)

How to split this string correctly?

When I use this:

.str.split(',', expand=True)

, it also divides my string between parentheses, but I don't want to. How to do this correctly (and an explanation please)?

Added explanation on the solution using `str.split()`. This solution is a tweak on your code, to ensure only split on the comma between tuples instead of within a tuple. — SeaBean, Aug 02 '21 at 07:28
@SeaBean Yes you're right. For the solution, I had to tweak my data a bit to make it work. It's a fact that I was inquiring about a dataframe. I modified the solution label. — Cesc, Aug 02 '21 at 07:28

SeaBean · Accepted Answer · 2021-08-02T07:26:16.357

You can use str.extract() with regex, as follows:

df['data'].str.extract(r'(\(\d+,\s*\d+\))\s*,\s*(\(\d+,\s*\d+\))')

or use str.split(), as follows:

df['data'].str.split(r'(?<=\))\s*,\s*', expand=True)

Here we use regex positive lookbehind to look for a closing parenthesis ) before comma , for the comma to match. Hence, we only split on the comma between tuples and not within tuples.

Result:

       0      1
0  (0,1)  (1,2)

score 1 · Answer 2 · answered Aug 02 '21 at 06:37

1

You can use eval.

tuple_str = "(0,1), (1,2)"
my_tuple = eval(tuple_str)
print(my_tuple)
>>> ((0, 1), (1, 2))

Read more about eval here.

answered Aug 02 '21 at 06:37

Sajad

492
2
10

score 1 · Answer 3 · answered Aug 02 '21 at 06:50

You can try this :

import pandas as pd

df=pd.DataFrame({"data":['(0,1), (1,2)']})

new_df=pd.DataFrame(df.data.str.split(", ").tolist())
print(new_df)
"""
           data
0  (0,1), (1,2)

       0      1
0  (0,1)  (1,2)
"""

We are splitting "data" column using , , we converted that into list and we are making new DataFarme using that data.

score 1 · Answer 4 · answered Aug 02 '21 at 06:50

Also using regex as other anwser, but you can use re.split

import re

str='(0,1), (1,2),(3,4)' 
re.split('(?<=\)) *, *(?=\()', str) #['(0,1)', '(1,2)', '(3,4)']

like String.split, re.split will split string but using regex as delimiter re.split document can be found here: https://docs.python.org/3/library/re.html#re.split

regex I use come from this answer. Regular Expression to find a string included between two characters while EXCLUDING the delimiters

score 1 · Answer 5 · answered Aug 02 '21 at 07:02

Use regex \(\d+,\s*\d+\) to match two comma separated numbers enclosed by parenthesis, pass this regex to str.findall then apply pd.Series. It will create new columns with the values that match the pattern.

df['data'].str.findall('\(\d+,\s*\d+\)').apply(pd.Series)
       0      1
0  (0,1)  (1,2)

score 0 · Answer 6 · answered Aug 02 '21 at 06:34

You may try regexs:

import re
r=re.findall(r'\(\d+,\d+\)','(0,1),(1,2)')
print(r) # ['(0,1)', '(1,2)']

re.findall means finding all strings matching the regex (first argument) within the haystack (second argument).

The regex given means to match a pair of () with two numbers (\d+) seperated by a ,.

Or if you want a more extendable version,swap out the second line with
r=re.findall(r'\(.*?\)','(0,1),(1,2)')

The .*? means to match any number of charctors but try matching as little as possible.

score 0 · Answer 7 · answered Feb 11 '22 at 11:52

You can use the following regex with Series.str.split:

import pandas as pd
df = pd.DataFrame({'data': ['(0,1), (1,2)']})
df2 = df['data'].str.split(r'\s*,\s*(?![^()]*\))', expand=True)

Output of df2:

       0       1
0  (0,1)   (1,2)

See the regex demo. Details:

\s*,\s* - a comma enclosed with zero or more whitespaces
(?![^()]*\)) - a negative lookahead that fails the match if, immediately to the right of the current location, there are zero or more chars other than ( and ) and then a ) char.

Pandas: How to split a column of string of multiple tuples to multiple columns of individual string of tuple

7 Answers7