2

Currently I have a list of dataframes which I run through a list comprehension. The result is then all the dataframes with or without rows that qualify the function in the list comprehension. I would like to only print out the df's that are non-empty. Is that at all possible? In addition, would it be possible to print out the names of the dataframes as well?

Example:

N = 5

np.random.seed(0)

df1 = pd.DataFrame(
    {'X':np.random.uniform(0,5,N),
     'Y':np.random.uniform(0,5,N),
     'Z':np.random.uniform(0,5,N),
    })

df2 = pd.DataFrame(
    {'X':np.random.uniform(-5,0,N),
     'Y':np.random.uniform(-5,0,N),
     'Z':np.random.uniform(-5,0,N),
    })

def func_sel(df):
    return df[df['X'] > 0]

dfs_list = [df1, df2]

dfs_sel = [func_sel(x) for x in dfs_list]

dfs_sel

Out[14]:
[          X         Y         Z
 0  2.744068  3.229471  3.958625
 1  3.575947  2.187936  2.644475
 2  3.013817  4.458865  2.840223
 3  2.724416  4.818314  4.627983
 4  2.118274  1.917208  0.355180, Empty DataFrame
 Columns: [X, Y, Z]
 Index: []]

EDIT: What I need here is df1 shown only with 'df1' as a label of some sort.

Zanshin
  • 1,262
  • 1
  • 14
  • 30

3 Answers3

3

I'd use a dictionary instead of list in this case.

Demo:

In [110]: dfs_dict = {'df1':df1, 'df2':df2}

In [111]: dfs_sel = {name:func_sel(df) for name, df in dfs_dict.items()}

In [112]: dfs_sel
Out[112]:
{'df1':           X         Y         Z
 0  2.744068  3.229471  3.958625
 1  3.575947  2.187936  2.644475
 2  3.013817  4.458865  2.840223
 3  2.724416  4.818314  4.627983
 4  2.118274  1.917208  0.355180, 'df2': Empty DataFrame
 Columns: [X, Y, Z]
 Index: []}

In [113]: [df if len(df) else name for name, df in dfs_sel.items()]
Out[113]:
['df2',           X         Y         Z
 0  2.744068  3.229471  3.958625
 1  3.575947  2.187936  2.644475
 2  3.013817  4.458865  2.840223
 3  2.724416  4.818314  4.627983
 4  2.118274  1.917208  0.355180]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
3
  • I agree with @MaxU, if you want names of you dfs, you need to embed the name in the data structure. I'll use a list of tuples for this purpose.
  • I'll use the empty attribute to filter the list

dfs_list = [('df1', df1), ('df2', df2)]
dfs_sel = [
    (n, df) for n, df in [(n, func_sel(x)) for n, x in dfs_list] if not df.empty]

dfs_sel

[('df1',           X         Y         Z
  0  2.744068  3.229471  3.958625
  1  3.575947  2.187936  2.644475
  2  3.013817  4.458865  2.840223
  3  2.724416  4.818314  4.627983
  4  2.118274  1.917208  0.355180)]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

How about this:

EDIT: This version supports both manually naming DataFrames or automatic enumeration.

import pandas as pd
import numpy as np

N = 5

np.random.seed(0)

df1 = pd.DataFrame(
    {'X':np.random.uniform(0,5,N),
     'Y':np.random.uniform(0,5,N),
     'Z':np.random.uniform(0,5,N),
    })


df2 = pd.DataFrame(
    {'X':np.random.uniform(-5,0,N),
     'Y':np.random.uniform(-5,0,N),
     'Z':np.random.uniform(-5,0,N),
    })

# OPTIONAL: manually assign names
df1.name = 'df1'
df2.name = 'df2'

def func_sel(df, name=None):
    rdf = df[df['X'] > 0]
    try:
        rdf.name = df.name
    except:
        rdf.name = name
    rdf.columns = ['%s %s' % (rdf.name or '', c) for c in rdf.columns]
    return rdf

dfs_list = [df1, df2]

dfs_sel = [func_sel(df, 'df%d' % (x+1)) for x, df in enumerate(dfs_list) if not func_sel(df).empty]

dfs_sel

dfs_sel outputs:

[      df1 X     df1 Y     df1 Z
0  2.744068  3.229471  3.958625
1  3.575947  2.187936  2.644475
2  3.013817  4.458865  2.840223
3  2.724416  4.818314  4.627983
4  2.118274  1.917208  0.355180]

Each column has the name of the DataFrame appended. If no names are manually assigned, enumaration will be used.

AArias
  • 2,558
  • 3
  • 26
  • 36
  • Be forewarned, however, that pandas.DataFrame objects do not preserve arbitrary metadata attributes, such as this `name` attribute, if you perform any operations on the DataFrame before passing it to your `func_sel` function. See [here](http://stackoverflow.com/a/14688398). – u55 Jan 28 '17 at 12:39
  • This works for the example provided, but noted. Maybe names should be stored somewhere else or provided to `func_sel` depending on OP's needs, if they need to perform further operations on the DataFrame. – AArias Jan 28 '17 at 12:44
  • @AArias, thanks. One issue though, I would have to add this name attribute to all the df's I have, 81 to be precise. I imagine this would have to be done manually? – Zanshin Jan 28 '17 at 12:46
  • @Zanshin no, just updated the answer so that you don't have to assign names manually. – AArias Jan 28 '17 at 12:53
  • @Zanshin I just updated the answer again. This final version supports both manually assigning names to dfs and automatic enumeration. – AArias Jan 28 '17 at 13:15