I have a pandas DataFrame, in which one column resources
consists of a list of tuples. For example, take the following DataFrame:
df = pd.DataFrame({"id": [1, 2, 3],
"resources": [[(1, 3), (1, 1), (2, 9)],
[(3, 1), (3, 1), (3, 4)],
[(9, 0), (2, 6), (5,5)]]
})
Now, I want to add the following columns to my DataFrame, which contain the following:
- A column
first
containing a list with the unique first elements of the tuples inresources
(so basically a set of all the first elements) - A column
second
containing a list with the unique second elements of the tuples inresources
(so basically a set of all the second elements) - A column
same
containing the number of tuples inresources
having the same first and second element - A column
different
containing the number of tuples inresources
having different first and second elements
the desired output columns would look like this:
first
:[[1, 2], [3], [9, 2, 5]]
second
:[[1, 3, 9], [1, 4], [0, 6, 5]]
same
:[1, 0, 1]
different
:[2, 3, 2]
How to achieve this in the least time consuming way? I was first thinking of using Series.str
, but could not find enough functionality there to achieve my goal