-3

I'm having an that I can simplify to the following situation in which I introduce a dataframe, I make a selection in a loop and make a new dataframe containing the subset of the old one that satisfy the conditions:

import pandas as pd
import intertools
g = ['M', 'M', 'F', 'F']
a = [20, 33, 20, 50]
Zip = [21202, 21018, 21202, 22222] 
d = [0, -3, 8]

parameters = (g, a)
names = ['gender', 'age']

df = pd.DataFrame({'age':a, 'gender':g, 'd':d, 'Zip':Zip})

for values in itertools.product(*parameters):
    thesevalues = ((df[names[0]] == values[0]) & (df[names[1]] == values[1]]))
    subdf = df[thesevalues]

Works just fine, but what if I want to also include the zip codes in the parameters, with the names. I would also have to manually introduce a third selection criterium in "thesevalues". I am probably overlooking the functionality to make this list of parameters that I want in that criterium to adapt to the list of parameters? A loop seems like a bad option... Is there another way? Thanks!

Marcel
  • 185
  • 1
  • 1
  • 7
  • your code does not really makes sense since you will subset ...all your dataframe by iterating on all combinations ... – Colonel Beauvel Aug 16 '16 at 09:45
  • Your code does not work. – ayhan Aug 16 '16 at 09:47
  • Yeah the code works and this is just a small sample as an example, in reality, many lines will be selected with one combination of parameters and I then need to add up one column that is not part of the selection. – Marcel Aug 16 '16 at 15:47

1 Answers1

0

IIUC you need numpy.logical_and:

parameters = (g, a, Zip)
names = ['gender', 'age', 'Zip']

df = pd.DataFrame({'age':a, 'gender':g, 'd':d, 'Zip':Zip})
print (df)

for values in product(*parameters):
    #http://stackoverflow.com/a/20528566/2901002
    thesevalues = np.logical_and.reduce([df[names[x]] == values[x] for x in range(len(parameters))])
    subdf = df[thesevalues]
    print (subdf)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252