Splitting dataframe into two and using tilde ~ as variable

Question

I wanna do 2 similar operations with Pandas in Python 3. One with tilde and another without tilde.

1 - df = df[~(df.teste.isin(["Place"]))] 
2 - df = df[(df.teste.isin(["Place"]))]

I tried to declare the tilde as variable, so I could write just one line and then decide if I wanna use with or without tilde. But it doesn't work:

tilde = ["~", ""]
df = df[tilde[0](df.teste.isin(["Place"]))]

Is possible do something that could reduce my code? Cause I am writing many equal lines just exchanging the tilde...

Thanks!

Why I wanna the tilde as variable:

def server_latam(df):
    df.rename(columns={'Computer:OSI':'OSI'}, inplace=True) 
    df = df[~(df.teste.isin(["Place"]))]

    df1 = df.loc[df.model != 'Virtual Platform', 'model'].count()
    print("LATAM")
    print("Physical Servers: ",df1)
    df2 = df.loc[df.model == 'Virtual Platform', 'model'].count()
    print("Virtual Servers: ",df2)
    df3 = df.groupby('platformName').size().reset_index(name='by OS: ')
    print(df3)

def server_latam_without_tilde(df):
    df.rename(columns={'Computer:OSI':'OSI'}, inplace=True) 
    df = df[(df.teste.isin(["Place"]))]

    df1 = df.loc[df.model != 'Virtual Platform', 'model'].count()
    print("LATAM")
    print("Physical Servers: ",df1)
    df2 = df.loc[df.model == 'Virtual Platform', 'model'].count()
    print("Virtual Servers: ",df2)
    df3 = df.groupby('platformName').size().reset_index(name='by OS: ')
    print(df3)

In the second line of each function the tilde appears.

Possible duplicate of [The tilde operator in Python](https://stackoverflow.com/questions/8305199/the-tilde-operator-in-python) — deadvoid, Oct 23 '18 at 11:00
I don't see the benefit: the tilde is one character, which ususally corresponds to one byte. You want to store it in a list, which means you need at least three key-strokes and bytes to index it, plus another XY key-strokes/bytes to reference the variable. What is it you want to optimize exactly? Typing? Size? "Cleanness"? Because this approach, sadly, accomplishes neither of those. — Oliver Baumann, Oct 23 '18 at 11:01
Hi Oliver, I edited my question. Now you can see why I want to optimize. — William Rodriguez, Oct 23 '18 at 11:12
This seems an XY problem. The *underlying issue* is having lots of variable names unnecessarily. It's a valid problem in my opinion. The answer is to use `list` or `dict`. — jpp, Oct 23 '18 at 11:23

jpp · Answer 1 · 2018-10-23T11:45:23.343

For your limited use case, there is limited benefit in what you are requesting.

GroupBy

Your real problem, however, is the number of variables you are having to create. You can halve them via GroupBy and a calculated grouper:

df = pd.DataFrame({'teste': ['Place', 'Null', 'Something', 'Place'],
                   'value': [1, 2, 3, 4]})

dfs = dict(tuple(df.groupby(df['teste'] == 'Place')))

{False:        teste  value
        1       Null      2
        2  Something      3,

 True:         teste  value
            0  Place      1
            3  Place      4}

Then access your dataframes via dfs[0] and dfs[1], since False == 0 and True == 1. There is a benefit with this last example. You now remove the need to create new variables unnecessarily. Your dataframes are organized since they exist in the same dictionary.

Function dispatching

Your precise requirement can be met via the operator module and an identity function:

from operator import invert

tilde = [invert, lambda x: x]

mask = df.teste == 'Place'  # don't repeat mask calculations unnecessarily

df1 = df[tilde[0](mask)]
df2 = df[tilde[1](mask)]

Sequence unpacking

If your intention is to use one line, use sequence unpacking:

df1, df2 = (df[func(mask)] for func in tilde)

Note you can replicate the GroupBy result via:

dfs = dict(enumerate(df[func(mask)] for func in tilde)

But this is verbose and convoluted. Stick with the GroupBy solution.

score 0 · Answer 2 · answered Oct 23 '18 at 11:07

You could possibly condense your code a little by defining your tests and then iterating over those. Let me illustrate:

tests = ["Place", "Foo", "Bar"]

for t in tests:
    # not sure what you are doing exactly, just copied it
    1 - df = df[~(df.teste.isin([t]))] 
    2 - df = df[(df.teste.isin([t]))]

That way, you only have two linesdoing the actual work, and simply adding another test to the list saves you duplicating code. No idea if this is what you want, though.

Splitting dataframe into two and using tilde ~ as variable

2 Answers2

GroupBy

Function dispatching

Sequence unpacking

Linked