1

I have a dataset of patient surgery. Many of the patients have had multiple operations and the value_counts aggregation of their multiple operation codes (there are 4 codes) is shown below.

['O011']                                    2785
['O012']                                    1813
['O011', 'O011']                             811
['O013']                                     532
['O012', 'O012']                             522
['O014']                                     131
['O013', 'O013']                             125
['O014', 'O014']                              26
['O012', 'O011']                              24
['O011', 'O012']                              20
['O011', 'O011', 'O011']                      14
['O011', 'O013']                              12
['O012', 'O012', 'O011']                       6
['O011', 'O012', 'O012']                       6
['O011', 'O011', 'O011', 'O011']               5
['O013', 'O013', 'O013']                       5
['O013', 'O011']                               4
['O012', 'O012', 'O012']                       4
['O012', 'O013']                               3
['O013', 'O014']                               3
['O011', 'O013', 'O013']                       3
['O012', 'O014']                               3
['O011', 'O012', 'O011']                       2
['O012', 'O013', 'O013']                       2
['O011', 'O014']                               2
['O013', 'O012', 'O012']                       2
['O014', 'O014', 'O014']                       2
['O013', 'O012']                               1
['O012', 'O012', 'O013', 'O013', 'O013']       1
['O012', 'O011', 'O012']                       1
['O011', 'O011', 'O012']                       1
['O013', 'O013', 'O011']                       1
['O011', 'O011', 'O012', 'O012']               1
['O014', 'O013', 'O013']                       1
['O013', 'O013', 'O012']                       1
['O012', 'O011', 'O011']                       1
['O011', 'O012', 'O013']                       1
['O013', 'O011', 'O011']                       1
['O012', 'O012', 'O012', 'O012']               1
['O013', 'O013', 'O012', 'O012']               1
['O014', 'O013', 'O011', 'O011']               1
['O012', 'O011', 'O011', 'O011']               1
['O013', 'O011', 'O012']                       1

This shows the sequence of their operations by patient count, - so 2785 patients have had just the one procedure, - O012. I want to create a new column with a boolean 'Are all the operations the same'. There is an itertools recipe for comparing the values in a list here I am a surgeon and my python skills are not up to applying it to the series, - how do I create a new column using this function?.

The series is OPERTN_01_list I tried

from itertools import groupby

def all_equal(iterable):
    g = groupby(iterable)
    return next(g, True) and not next(g, False)

My dataset is mo (multiple operations), so I tried to apply the function all_equal to the series

mo['eq'] = all_equal(mo['OPERTN_01_list'])

but the new column mo['eq'] had all false values.

I am not sure the best way to implement the function.

capnahab
  • 343
  • 3
  • 14

1 Answers1

1

When you execute your function here

all_equal(mo['OPERTN_01_list'])

This returns a single value because the method sees mo['OPERTN_01_list'] as the iterable, rather than each row. Therefore it's checking something like this:

does row0 == row1? -> False

does row1 == row2? --> False

...

does rowN-1 == rowN? --> False

Seeing as the overall value is False, setting it to the mo['eq'] series repeats it for all rows. See this question/answer

There are at least three different approaches to getting what I think you want.

Use .apply

Apply the function over the contents of each row in the mo["OPERTN_01_list"] series.

mo["OPERTN_01_list"].apply(all_equal)

Out

# showing a sample of 5 rows for brevity
26     True
16    False
9     False
11    False
1      True

Use .transform

Pretty much the same as .apply but especially useful in .groupby operations

mo["OPERTN_01_list"].transform(all_equal)

Out

# showing a sample of 5 rows for brevity
1      True
11    False
2      True
8     False
42    False

Vectorize the function with np.vectorize and treat mo["OPERTN_01_list"] as the input

This will allow you to keep your code the same but with one minor change

# vectorize the function
all_equal = np.vectorize(all_equal)

all_equal(mo["OPERTN_01_list"])

Out

# note that this returns a `np.array` instead of a `pandas.Series`
[ True  True  True  True  True  True  True  True False False  True False
 False False  True  True False  True False False False False False False
 False False  True False False False False False False False False False
 False False  True False False False False]

Using any of these should get you the result you desire, but I may suggest using one of the first two in case your index changes

Ian Thompson
  • 2,914
  • 2
  • 18
  • 31