7

I have this code (which works) - a bunch of nested conditional statements to set the value in the 'paragenesis1' row of a dataframe (myOxides['cpx']), depending on the values in various other rows of the frame.

I'm very new to python and programming in general. I am thinking that I should write a function to perform this, but how then to apply that function elementwise? This is the only way I have found to avoid the 'truth value of a series is ambiguous' error.

Any help greatly appreciated!

myOxides['cpx'].loc['paragenesis1'] = np.where(
            ((cpxCrOx>=0.5) & (cpxAlOx<=4)),
            "GtPeridA", 
            np.where(
                    ((cpxCrOx>=2.25) & (cpxAlOx<=5)), 
                    "GtPeridB", 
                    np.where(
                            ((cpxCrOx>=0.5)&
                             (cpxCrOx<=2.25)) &
                             ((cpxAlOx>=4) & (cpxAlOx<=6)),
                             "SpLhzA",
                             np.where(
                                     ((cpxCrOx>=0.5) &
                                      (cpxCrOx<=(5.53125 - 
                                                 0.546875 * cpxAlOx))) &
                                      ((cpxAlOx>=4) & 
                                       (cpxAlOx <= ((cpxCrOx - 
                                                     5.53125)/ -0.546875))),
                             "SpLhzB",
                             "Eclogite, Megacryst, Cognate"))))

or;

df.loc['a'] = np.where(
            (some_condition),
            "value", 
            np.where(
                    ((conditon_1) & (condition_2)), 
                    "some_value", 
                    np.where(
                            ((condition_3)& (condition_4)),
                             "some_other_value",
                              np.where(
                                      ((condition_5),
                                        "another_value",
                                        "other_value"))))
Ivo
  • 3,890
  • 5
  • 22
  • 53
K. Mather
  • 93
  • 1
  • 7

1 Answers1

21

One possible solution is use numpy.select:

m1 = (cpxCrOx>=0.5) & (cpxAlOx<=4)
m2 = (cpxCrOx>=2.25) & (cpxAlOx<=5)
m3 = ((cpxCrOx>=0.5) & (cpxCrOx<=2.25)) & ((cpxAlOx>=4) & (cpxAlOx<=6))
m4 = ((cpxCrOx>=0.5) &(cpxCrOx<=(5.53125 -  0.546875 * cpxAlOx))) & \
     ((cpxAlOx>=4) &  (cpxAlOx <= ((cpxCrOx -  5.53125)/ -0.546875))

vals = [ "GtPeridA", "GtPeridB", "SpLhzA", "SpLhzB"]
default = 'Eclogite, Megacryst, Cognate'

myOxides['paragenesis1'] = np.select([m1,m2,m3,m4], vals, default=default)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • That works amazingly! - I clearly have a lot to learn. Thank you so very much. – K. Mather Mar 13 '18 at 10:53
  • 2
    An alternative is to use a row-wise function with if-elif statements and pandas `apply` as described in https://stackoverflow.com/a/18194448/1936114, but your solution is way faster, @jezrael! Thank you. – raninjan Nov 05 '18 at 20:15
  • This is X200 times the speed of nested `np.where` in my case. See here for `string` type column operation. – Jia Gao Sep 20 '21 at 01:23