1

I have a dataframe and I want to convert one column into conditions.

For example, here is a sample dataframe:

df=pd.DataFrame({'a':['>28','27','<26'],'b':['1','2','3']})

df
     a  b
0  >28  1
1   27  2
2  <26  3

I want to generate a series of if statements to get ideal b value:

if a > 28:
    b=1
elif a=27:
    b=2
elif a < 26:
    b=3

How can I do that? In my dataframe, all of elements are stored as string.

I was trying to use the iloc() function to select, but it can't deal with range conditions (>)

If there an elegant way to do this? or do I have to manually type all the conditions?

scotscotmcc
  • 2,719
  • 1
  • 6
  • 29
jjk
  • 111
  • 9
  • 2
    it isn't completely clear, at least to me, what you are trying to do. It kind of sounds like you may actually have a different problem and this is the solution you think may help solve that problem, rather than this being the main thing to solve. If you take a step back, why are you trying to do this in the first place? And if your concern is that you want to avoid manually typing all of the conditions - how are you getting this dataframe made in the first place? – scotscotmcc May 24 '23 at 19:04
  • I want to convert one column of a dataframe into conditions, and select value from another column base on conditions. I post it because simply don't know how to achieve this and I want to know how can it be done. @scotscotmcc – jjk May 24 '23 at 19:07
  • 1
    to be more clear, what is the source of the data and do you control it? if they have an extremely standard structure, you could text format 'em and evaluate it or simply act on them initially .. problems such as this are frequently [XY Problems](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem); what are you trying to do overall? – ti7 May 24 '23 at 19:13
  • You have a very good point. Yes, I do agree it looks like a XY problem. But I don't actually have a solution. I just want to use the code to clarify what I am trying to get. I don't have a source of data. I am trying to learn Pandas and this problem just came to me. I have done some research but I got nothing. @ti7 – jjk May 24 '23 at 19:20
  • 1
    @jjk, you may need to *re-think* the way you're approaching your actual problem! If you're **hardcoding** the values in your DataFrame `df` (which seems to be the case), then why not simply implement them directly into a bloc of `if.. else` statements ? – Timeless May 24 '23 at 19:35
  • I agree and you are absolutely right. But what if the dataframe is quite big and it takes sometime to build the if statements bloc? I know I might sound like I want to troll you guys, but I really want to know how can this be achieved. @Timeless – jjk May 24 '23 at 19:43
  • No worries! Although I still believe there is a pythonic/simple way to approach your actual problem, I posted an answer below ;) – Timeless May 24 '23 at 20:27
  • https://github.com/Droogans/unmaintainable-code#wrap-wrap-wrap – ti7 May 24 '23 at 20:28

3 Answers3

2

If you modify slightly your column 'a', you can use numexpr:

from numexpr import evaluate

a = 10
expr = 'a' + df['a'].mask(df['a'].str.isdigit(), other='==' + df['a'])
mask = expr.apply(evaluate, global_dict=globals(), local_dict=locals())

b = pd.to_numeric(df.loc[mask, 'b'], errors='coerce').head(1).squeeze() if any(mask) else None

Output:

>>> b
3

>>> expr
0     a>28
1    a==27
2     a<26
Name: a, dtype: object

>>> mask
0    False
1    False
2     True
Name: a, dtype: bool
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

This iterates over the rows in the DataFrame and performs an if condition based on column a, you can replace the print statement at the bottom with your desired goal, which was unclear to me. import pandas as pd

df = pd.DataFrame({'a': ['>28', '27', '<26'], 'b': ['1', '2', '3']})
#Iterate over rows in dataframe
for index, row in df.iterrows():
    a_value = row['a']

    #Seperating operators from integers for conditions
    if '<' or '>' in a_value:
        operator = a_value[0]
        num = a_value[1:]
    else:
        #adding equal sign for equal conditions
        operator = '='
        num = a_value

    #evaluating condition
    condition = f'a {operator} {num}'
    if eval('condition'):
        #Do something here (just printing b to show the condition is being met
        print(condition, row['b'])
kylelare
  • 11
  • 3
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 25 '23 at 00:51
1

For fun, you can try this :

In [1]: import pandas as pd # added ------v

In [2]: df = pd.DataFrame({'a': ['>28', '==27', '<26'], 'b': ['1', '2', '3']})

In [3]: print(df)
      a  b
0   >28  1
1  ==27  2
2   <26  3

In [4]: s = "\n".join([
   ...:     f"if a{a}:\n\tb = {b}" if idx == 0 else f"elif a{a}:\n\tb = {b}"
   ...:     for idx, (a, b) in enumerate(zip(df["a"], df["b"]))
   ...: ]) # s is a multi-line string

In [5]: print(s)
if a>28:
        b = 1
elif a==27:
        b = 2
elif a<26:
        b = 3

In [6]: a = 10

In [7]: exec(s)

In [8]: print(b)
3
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • 1
    **Disclaimer**: [Why should exec() and eval() be avoided?](https://stackoverflow.com/questions/1933451/why-should-exec-and-eval-be-avoided) – Timeless May 24 '23 at 20:30