0

I have a very long list of strings. These strings are those columns I would like to keep from a dataframe. However, there are a couple of strings in this list that don't match a column header, instead it matches an operation between two or more column headers I would like to performs.

Example:

   df =  @A @B @C @D @E @F
         1   2  3  4  5  6
         9   8  7  6  5  4
         1   3  5  7  9  11

list = ["@A", "@C", "@D / @F"]

I would like this to output:

@A @C @D/@F
1   3  0.667
9   7  1.5
1   5  0.63

However, I can't for the life of me figure out what to do. To make things a bit more complicated, sometimes in my list you might see a simple division like in my example, sometimes you might see something more complicated like "(C + D) / (A + F)". How would I do this??

I honestly can't figure out where to start. I thought I could just do something like df[list] and pandas would just "know" what to do, but I think I might be asking a bit too much!

Any help would be appreciated! Thanks.

Edit: After trying to use the eval() method, I've bumped into another issue. I didn't though this could be a problem, but most of my strings begin with the character "@" which throws up an error when passed into the eval function. I'll do some googling on how to work around this, but any additional help is also welcome.

Edit 2:

I figured out a work around for the @ issue. I simply used the replace() method.

lst = [word.replace("@", "") for word in lst]

Then, I do the same this for all of my columns as well.

Jr Analyst
  • 58
  • 6

2 Answers2

1

Usually, what would be necessary is to write a simple parser to make the operation. But Python gives a method that could solve your problem: eval(). From this other answer, we can deduce that:

>>> var = {'a':np.array([1,2,2]),'b':np.array([2,1,3]),'c':np.array([3])}
>>> formula = '2*a*(b/c)**2'
>>> eval(formula, var)
array([ 0.8889,  0.4444,  4.    ])

So, if you defined at first every column inside a well-formed dict, it should work as expected.

For example:

var = {'A': df[0], 'B':df[1], ...}
eval(list[0], var)

Be careful when you are using eval() though. It can be quite dangerous if you are not the one giving the variable to the method.

Hope it helps!

Miionu
  • 52
  • 1
  • 2
  • This is fantastic. I didn't know this method existed. I'll give it a try and get back to you. Also, thank you for the link to that other Q&A. – Jr Analyst Jul 16 '19 at 13:42
0

Use DataFrame.eval:

note: never use list as a listname, its a reserved word in Python. I changed it to lst

lst = ["A", "C", "D / F"]

df = pd.DataFrame(df.eval(lst).T, columns=lst)

Output

   A  C     D / F
0  1  3  0.666667
1  9  7       1.5
2  1  5  0.636364

If we extend your lst with more complex calculation, this method still holds:

lst = ["A", "C", "D / F", "(C + D) / (A + F)"]

df = pd.DataFrame(df.eval(lst).T, columns=lst)

Output

   A  C     D / F (C + D) / (A + F)
0  1  3  0.666667                 1
1  9  7       1.5                 1
2  1  5  0.636364                 1
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • This is even simpler. Thank you. One issue I'm facing now though is that the first character in most of the strings with my lst is the character "@". This is making python quite angry. – Jr Analyst Jul 16 '19 at 13:57
  • Edit your question accordingly @JrAnalyst – Erfan Jul 16 '19 at 13:58
  • I ended up using your solution. However, I don't put ".T" at the end. And instead of calling the whole list in eval() and columns=, I instead loop through my list with an index. I seem to be getting the right results though. Thanks for the help! – Jr Analyst Jul 16 '19 at 19:17