How to select a particular dataframe from a list of dataframes in Python equivalent to R?

Question

I have a list of dataframes in R, with which I'm trying to select a particular dataframe as follows:
x = listOfdf$df1$df2$df3
Now, trying hard to find an equivalent way to do so in Python. Like, the syntax on how a particular DataFrame be selected from a list of DataFrames in Pandas Python.

Do you want list of df? Not better dictionary of dataframes? — jezrael, Aug 11 '17 at 06:48
Actually, I'm in a process of converting R to Python. So, I came across selecting a particular df from a list of df in R(as the example mentioned in Question) and trying to do so in Python as well. Whether the same is possible in Python, need an equivalent approach. In Python, while selecting a particular column from a df, it can be done by `df['colname']` (whereas in R, `df$colname`) , likewise how this can be done. — user12345, Aug 11 '17 at 06:57

score 1 · Answer 1 · answered Aug 11 '17 at 10:32

1

Found a solution to select a particular dataframe/dataframe_column from a list of dataframes.
In R : x = listOfdf$df1$df2$df3 In Python : x = listOfdf['df1']['df2']['df3']

Thank you :)

answered Aug 11 '17 at 10:32

user12345

499
1
5
21

Actually in R you can do the same with single or double brackets. – Parfait Aug 11 '17 at 18:46

vestland · Answer 2 · 2019-12-20T11:51:29.010

I see you've already answered your own question, and that's cool. However, as jezrael hints in his comment, you should really consider using a dictionary. That might sound a bit scary coming from R (been there myself, now I prefer Python in most ways), but It will be worth your effort.

First of all, a dictionary is a way of mapping a value or variable to a key (like a name). You use curly brackets { } to build the dictionary, and use square brackets [ ] to index it.

Let's say that you have two dataframes like this:

np.random.seed(123)
# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

With a limited number of dataframes you can easily organize them in a dictionary this way:

myFrames = {'df_1': df_1,
            'df_2': df_2}

Now you have a reference to your dataframes, as well as your own defined names or keys. You'll find a more elaborate explanation here.

Here's how you use it:

print(myFrames['df_1'])

You can also use that reference to make changes to one of your dataframes, and add that to your dictionary:

df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})
print(myFrames)

Now lets say that you have a whole bunch of dataframes that you'd like to organize the same way. You can make a list of the names of all available dataframes like described below. However, you should be aware that using eval() for many reasons often is not recommended.

Anyway, here we go: First you get a list of strings of all dataframe names like this:

alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

It's more than likely that you won't be interested in ALL of them if you've got a lot going on at the same time. So lets say that the names of all your dataframes of particluar interest start with 'df_'. You can isolate them like this:

dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

Now you can use that list in combination with eval() to make a dictionary:

myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

Now you can loop through that dictionary and do something with each of them. You could, as an example, take the last column of each dataframe, multiply by 10, and make a new dataframe with those values:

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead of a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

Hope you'll find this useful!

And by the way... For your next question, please provide some reproducible code as well as a few words about what solutions you have tried yourself. You can read more about how to ask an excellent question here.

And here is the whole thing for an easy copy&paste:

#%%

# Imports
import pandas as pd
import numpy as np

np.random.seed(123)

# Reproducible input - Dataframe 1
rows = 10
df_1 = pd.DataFrame(np.random.randint(90,110,size=(rows, 2)), columns=list('AB'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_1['dates'] = datelist 
df_1 = df_1.set_index(['dates'])
df_1.index = pd.to_datetime(df_1.index)

##%%

# Reproducible input - Dataframe 2
rows = 10
df_2 = pd.DataFrame(np.random.randint(10,20,size=(rows, 2)), columns=list('CD'))
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df_2['dates'] = datelist 
df_2 = df_2.set_index(['dates'])
df_2.index = pd.to_datetime(df_2.index)

print(df_1)
print(df_2)
##%%


# If you dont have that many dataframes, you can organize them in a dictionary like this:
myFrames = {'df_1': df_1,
            'df_2': df_2}  


# Now you can reference df_1 in that collecton by using:
print(myFrames['df_1'])

# You can also use that reference to make changes to one of your dataframes,
# and add that to your dictionary
df_3 = myFrames['df_1']
df_3 = df_3*10
myFrames.update({'df_3': df_3})

# And now you have a happy little family of dataframes:
print(myFrames)
##%%

# Now lets say that you have whole bunch of dataframes that you'd like to organize the same way.
# You can make a list of the names of all available dataframes like this:
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]

##%%
# It's likely that you won't be interested in all of them if you've got a lot going on.
# Lets say that all your dataframes of interest start with 'df_'
# You get them like this:
dfNames = []
for elem in alldfs:
   if str(elem)[:3] == 'df_':
       dfNames.append(elem)

##%%
# Now you can use that list in combination with eval() to make a dictionary:
myFrames2 = {}
for dfName in dfNames:
    myFrames2[dfName] = eval(dfName)

##%%
# And now you can reference each dataframe by name in that new dictionary:
myFrames2['df_1']

##%%
#Loop through that dictionary and do something with each of them.

j = 1
for key in myFrames.keys():

    # Build new column names for your brand new df
    colName = []
    colName.append('column_' + str(j))

    if j == 1:
        # First, make a new df by referencing the dictionary
        df_new = myFrames2[key]

        # Subset the last column and make sure it doesn't
        # turn into a pandas series instead for a dataframe in the process
        df_new = df_new.iloc[:,-1].to_frame()

        # Set new column names
        df_new.columns = colName[:]
    else:
        # df_new already exists, so you can add
        # new columns and names for the rest of the columns
        df_new[colName] = myFrames2[key].iloc[:,-1].to_frame()
    j = j + 1

print(df_new)

How to select a particular dataframe from a list of dataframes in Python equivalent to R?

2 Answers2

Linked