1

I have a lot of different files that I'm trying to load to pandas in a pythonic way but also to add to different cells to make this look easy. Now I have 36 different variables but to make things easy, I'll show you an example with three different dataframes.

screenshot

But let's say I'm uploading CSV files with this into dataframes but in different cells, automatically generated.

file_list = ['df1.csv', 'df2.csv', 'df3.csv']
name_list = ['df1', 'df2', 'df3']

I could easy create three different cells and type:

df1 = pd.read_csv('df1.csv')

But there are dozens of different CSVs and I want to do similar things like delete columns and there have to be easier ways.

I've done something such as:

var_list = []

for file, name in zip(file_list, name_list):
    var_name = name
    var_file = pd.read_csv(file)
    var_list.append((file, name, var_file))

print(var_list)

But this all occurs in the same cell.

Now I looked at the ipython docs, as this is the package I believe has to do with this, but I couldn't find anything. I appreciate your help.

E_Sarousi
  • 151
  • 9
  • Maybe check out https://stackoverflow.com/questions/54987129/how-to-programmatically-create-several-new-cells-in-a-jupyter-notebook – Clej Mar 19 '22 at 18:03
  • 1
    What do you mean add new cells ? – Psidom Mar 19 '22 at 18:09
  • 1
    It's not really clear why you want to programmatically add new cells to the notebook. Why not read in all your dataframes at once? What is the benefit of loading them each into their own separate cell? Is it because you want to display the dataframes? Because for that you can just print them... – ddejohn Mar 19 '22 at 18:28
  • Also your dataframes don't look correct... you are not loading the data into the constructors properly. You should be doing `pd.DataFrame({"name": name_list, "age": age_list, "height": height_list})`. Right now for example, your name column in DF3 is `Sarah, 53, 56` which is obviously not correct.... – ddejohn Mar 19 '22 at 18:30
  • Furthermore, if all your dataframes are the same structure, why not load them all into a single dataframe? – ddejohn Mar 19 '22 at 18:31

1 Answers1

2

From what I understand, you need to load the content of several .csv files into several pandas dataframes, plus, you want to execute a repeatable process for each of them. You're not sure they will be loaded correctly, but you still want to be able to get the max out of them, and to this end you want to run each process in its own Jupyter cell.

As pointed out by ddejohn, I don't know if that's the best option, but anyway, I think it's a cool question. Next code generates several cells, each of them having a common structure with different variables (in my example, I simply sort the loaded dataframe by age, as an example). It is based on How to programmatically create several new cells in a Jupyter notebook page, which should get the credit, if it is indeed what you were looking for:

from IPython.core.getipython import get_ipython
import pandas as pd

def create_new_cell(contents):
    shell = get_ipython()
    payload = dict(
        source='set_next_input',
        text=contents,
        replace=False,
    )
    shell.payload_manager.write_payload(payload, single=False)

def get_df(file_name, df_name):
    content = "{df} = pd.read_csv('{file}', names=['Name', 'Age', 'Height'])\n"\
               "{df}.sort_values(by='Age', inplace=True)\n"\
               "{df}"\
               .format(df=df_name, file=file_name)
    create_new_cell(content)

file_list = ['filename_1.csv', 'filename_2.csv']
name_list = ['df1', 'df2']
for file, name in zip(file_list, name_list):
    get_df(file, name)
Clej
  • 416
  • 3
  • 13