2

I have a loop that creates data frames. Each of them need a unique name. The loop also creates unique strings, which are associated with the data frame. For example, in the below, the loop creates 10 unique strings and dataframes. How do I assign the string to the dataframe name, and make it so the dataframe is accessible outside of the function.

for i in range(10):
    string = "name" + str(i)
    df = pandas.DataFrame("some data")
    #INSERT some code that changes df's name to "namei"
    return namei

Thanks!

user6472523
  • 211
  • 3
  • 8
  • 1
    what's "df's name"? – Quang Hoang May 08 '19 at 14:21
  • 2
    i would recommend you to store the dfs in a dictionary with key as the df name and value as the df – anky May 08 '19 at 14:22
  • https://stackoverflow.com/a/31727504/4879688 – abukaj May 08 '19 at 14:22
  • 2
    NO, do not do what that link suggests. Setting a DataFrame attribute with `df. =` is a horrible idea and will break the inherent df.col_name functionality that exists. It becomes ambiguous (and broken) when you also have a column named `'name'`. – ALollz May 08 '19 at 14:23
  • @ALollz it works for me (pandas 0.23.4, Python 3.6.7) but I am eager to learn what the "df's name" is if the linked answer is invalid. – abukaj May 08 '19 at 14:28
  • @Quang Hoang, I want the df's name to be equal to string – user6472523 May 08 '19 at 14:28
  • `df = pandas.DataFrame("some data")` results in `ValueError: DataFrame constructor not properly called!` – abukaj May 08 '19 at 14:31
  • @abukaj DataFrames do not have a name by default. Sure you can set one with `df.name = `, but you shouldn't (if you want to know why, perhaps ask a question). I.e. `pd.DataFrame().name` will give you an error. On the other hand Incides do have a name attribute: `pd.Index([]).name` – ALollz May 08 '19 at 14:32
  • 2
    Pandas uses the `name` attribute on dataframes that result from `groupby`. If you look at `[d.name for _, d in df.groupby(level=0)]` you'll see a list of all the unique index values of the first level. The `name` is also the same as the unique group key. In the context of the comprehension, `d.name == _`. Outside a `groupby` context a dataframe wouldn't have a `name` attribute unless you gave it one. It shouldn't hurt anything but I don't know why you would want to. – piRSquared May 08 '19 at 14:33
  • @ALollz then I guess the answer to the question is "you can not rename nameless". – abukaj May 08 '19 at 14:33
  • 1
    In addition. When you do assign something to some attribute of a dataframe, pandas won't do anything to preserve that attribute if you were to perform an operation on it. You may lose it somewhere on the journey of data manipulation. By naming it, you hint at keeping it around for some purpose. I'd follow @ALollz advice and ask you question about that. – piRSquared May 08 '19 at 14:36
  • Is there no way to set ```string = "something" ``` where it doesn't overwrite string but instead overwrites namei? – user6472523 May 08 '19 at 14:37
  • Use a dictionary. `my_dict_of_df = {f'name{i}': pd.DataFrame([[1]]) for i in range(10)}` – piRSquared May 08 '19 at 14:40
  • Then if you want that in a dataframe `my_df_of_df = pd.concat(my_dict_of_df)`. – piRSquared May 08 '19 at 14:42
  • Ok thanks I will look into that. Never used it before. – user6472523 May 08 '19 at 14:42
  • Actually, I think pandas stopped populating the `name` attribute in the way I suggested above... so just use a dictionary and forget naming then dataframes with the name attribute. – piRSquared May 08 '19 at 14:45
  • 1
    I think the easiest solution for me is to save the dataframes as pickles, then, when I want to reference that data frame, I'll just pull in the pickle. Since I can name the pickle using my "string". – user6472523 May 08 '19 at 14:50
  • @user6472523 the big question that you haven't answered is "why you want to do this". If you need to reference `DataFrame` objects by name, a dict is the correct tool (as @piRSquared mentioned). Anything manipulating the DataFrame itself is a nasty hack that will make things very brittle for you. If there's another reason for your question, please add it. – bsplosion May 08 '19 at 14:59
  • @bsplosion I don't know how to use dicts, so that will take some time. Also, the reason I want to do this is to organize my data. I imagine being able to quickly pull data and run it through this analysis I'm doing. As opposed to having one gigantic dataframe, I can have many smaller ones and just pull in the ones I need. – user6472523 May 08 '19 at 15:12
  • @user6472523 I'd highly recommend getting familiar with dicts before diving into pickling - using a dict to organize your data would be a far simpler solution and be instantly recognizable to anyone who's familiar with python. – bsplosion May 08 '19 at 15:17
  • 1
    @bsplosion ok just read it, doesn't look that hard. This will be great as now I can refer to index values in my smaller data frames as opposed to column and row names, since all my small dataframes have the same size – user6472523 May 08 '19 at 15:19

0 Answers0