0

I have two dataframes test1 and test2. My program logic is like below

def write_file():
   test1.to_csv(('test1.csv'),index=None)

def process_file():
    test2= pd.read_csv('test1.csv',low_memory=False)

def write_processed_file():
   test2.to_csv(('test2.csv'),index=None)

I invoke all the above functions like below

write_file()
process_file()
write_processed_file()

As you can see, I have two write functions just to write the dataframe because I want the .csv file names to be different for both the dataframes. If I follow the below input argument approach to have just one write function then I can have only one common file name. How do we get the datframe name?

def write_file(df_name):
   df_name.to_csv(('common_file_name.csv'),index=None)

I expect my output to have two csv files with the name test1.csv and test2.csv without having two write functions

Basically I have 400-500 lines of code where has 15-18 lines if code to write dataframe to csv files. I would like to have one write function which accepts dataframe as input and provides the name of the dataframe as csv file name.

Is there anyway to get the dataframe name and save the file with the same name in a elegant and efficient manner?

The Great
  • 7,215
  • 7
  • 40
  • 128

1 Answers1

2

Using the names of variables in code is considered to be bad style. While it is possible in Python, I would recommend simply passing two arguments:

def write_file(df, filename):
    df.to_csv(filename, index=None)

You would use this in your code as

write_file(test1, 'test1.csv')
write_file(test2, 'test2.csv')

Now, what if you have many dataframes which all follow a predictable naming pattern like above? In this case, it might be better to use a list to keep the dataframes in.

test = [test1, test2, test3, ..., test100]

You can then index into this list, writing the files in a loop

for i, df in enumerate(test, 1):
    write_file(df, f'test{i}.csv')

But what if you have many dataframes and the names are not in a predictable numeric pattern? Then I would rather use a dictionary:

dfs = {'test1': test1, 
       'test2': test2,
       'other_df': other_df,
       'inline_df': process_df()  # you can store them straight from a function
       }

for name, df in dfs.items():
    write_file(df, f'{name}.csv')
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
  • but do you have any idea as to how I can get the dataframe name? Just without dictionary. I mean something like `df.name`. I understand this might mean the column called `name` but how can I get the name of the df. Is there any pythonic way to do that? – The Great Aug 22 '19 at 08:05
  • 1
    Like I mentioned, it's not considered good practice to use the name of a variable in code. It might be weird to understand, but dataframes don't actually "have" names as a property, rather names are associated with the dataframe in memory. So while [ways may exist](https://stackoverflow.com/questions/592746/how-can-you-print-a-variable-name-in-python), you shouldn't be trying to do this. See this article on [how names work in Python](https://nedbatchelder.com/text/names.html). – chthonicdaemon Aug 22 '19 at 08:10
  • got it. Thank you – The Great Aug 22 '19 at 08:10
  • Can you help me with this? https://stackoverflow.com/questions/57667919/how-to-build-custom-operation-as-a-calculator-to-handle-million-rows-efficiently – The Great Aug 27 '19 at 06:10