0

I have seven dataframes tbl1851, tbl1861, tbl1871, tbl1881, tbl1891, tbl1901, tbl1911.

Each dataframe has the same fields 'Sex', 'Age', 'Num'.

I want to select a subset from each dataframe by first creating series of boolean.

My code looks like

AM1851 = ((tbl1851.Sex=="M") & (tbl1851.Age>=15) & (tbl1851.Age<999))
AM1861 = ((tbl1861.Sex=="M") & (tbl1861.Age>=15) & (tbl1861.Age<999))
AM1871 = ((tbl1871.Sex=="M") & (tbl1871.Age>=15) & (tbl1871.Age<999))
AM1881 = ((tbl1881.Sex=="M") & (tbl1881.Age>=15) & (tbl1881.Age<999))
AM1891 = ((tbl1891.Sex=="M") & (tbl1891.Age>=15) & (tbl1891.Age<999))
AM1901 = ((tbl1901.Sex=="M") & (tbl1901.Age>=15) & (tbl1901.Age<999))
AM1911 = ((tbl1911.Sex=="M") & (tbl1911.Age>=15) & (tbl1911.Age<999))

I am wondering if there is a looping script that can achieve the same results as the codes listed above?

There are many different selection combinations, so I don't really want to copy and paste and research and replace lots of times.

user2960592
  • 41
  • 1
  • 1
  • 4
  • Possible duplicate of [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – G. Anderson May 10 '19 at 15:35
  • Your best bet may be to create a function that takes in a dataframe and returns the subset of that dataframe as desired, then apply it to all your DFs according to the link above – G. Anderson May 10 '19 at 15:37

3 Answers3

1

Instead of having each dataframe as a separate variable, put them in a list:

frames = [
    # dataframe 1,
    # dataframe 2,
    # etc.
]

Then you can easily loop through them to create another list:

AMs = []
for frame in frames:
    AMs.append((frame.Sex=="M") & (frame.Age>=15) & (frame.Age<999))
John Gordon
  • 29,573
  • 7
  • 33
  • 58
0

I think a function would do that, since each line uses the same tblxxxx object 3 times. I would try something like:

def build_my_data_set(input_data_frame):
    return ((input_data_frame.Sex=="M") & (input_data_frame.Age>=15) & (input_data_frame.Age<999))

my_data_frames = [build_my_data_set(data_item) for data_item in [tbl1851, tbl1861, tbl1871]] # but you would fill the list with every item you want to include

The resulting my_data_frames would represent a list with all the AMxxxx objects you have defined. Thereby condensing them all to a single variable that you would index to find the appropriate item. If you need to associate the xxxx bit you should implement a dictionary instead, and use that as the key!

Reedinationer
  • 5,661
  • 1
  • 12
  • 33
0

You could group them into an array and loop through them:

tbls = [tbl1851, tbl1861, tbl1871, tbl1881, tbl1891, tbl1901, tbl1911]
my_func = lambda x : ((x.Sex=="M") & (x.Age>=15) & (x.Age<999))
AMs=[]
for df in k:
   AMs.append(df.apply(my_func))

And if you want to access the element by their names, in stead of creating a list, you could create a dictionary, with the names of the variables as keys to them:

AM_names=["AM1851","AM1861","AM1871","AM1871","AM1881","AM1891","AM1901","AM1911"]
tbls = [tbl1851, tbl1861, tbl1871, tbl1881, tbl1891, tbl1901, tbl1911]
my_func = lambda x : ((x.Sex=="M") & (x.Age>=15) & (x.Age<999))
AMs={}
for idx, df in enumerate(tbls):
   AMs[df[AM_names[idx]]]=df.apply(my_func)