-1

I am looking to chunk a bunch of data from a dataframe. In order to do so, I need to define a dynamic name to a dictionary.

I would like to do something like:

dict_{}.format(VARIABLE_NAME) = {}

The above shown is an illegal operation. How can I go about defining a new dictionary name every time I need to create one? This is happening in a for loop, so I need to use dynamic dict names. Let me know if there is anything else I need to provide.

Here is a snippet of the dataframe

   REFERENCE_CODE                                        TRANSLATION
0      ladder_now                                                NaN
1               0                                              xyzwu
2               1                                              yxzuv
3               2                                            asdfasd
4               3                                             sdfsdh
5               4                                             hghffg
6               5                                            agfdhsj
7               6                                            dfgasgf
8               7                                             jfhkgj
9               8                                           djfgjfhk
10              9                                            dsfasys
11             10                                            kghkfdy
12             98                                          dsfhsuert
13             99                                           wsdfadjs
14  country_satis  Sa pangkagab’san, aoogma po ba kamo o dai naoo...
15              1                                            Naoogma
16              2                                        Dai naoogma
17              8                           Dai aram (HUWAG BASAHIN)
18              9                           Huminabo (HUWAG BASAHIN)
19            NaN                                                NaN

I am trying to take chunks of data, as in, take ladder_now and all the values associated with it, then find country_satis and take those values, put them in a separate dictionary. Here is the logic I have.. just missing the dynamically created dict:

for index, row in df.iterrows():
    j = 0
    if isinstance(row['REFERENCE_CODE'], str):
        if j == 0:
            # fix dynamically changing dict here
            trend_dict = {}
            trend_dict[row['REFERENCE_CODE']] = row['TRANSLATION']
        else:
            j = 0
            # create new dynamically named dictionary
            next_dict = {}
            next_dict[row['REFERENCE_CODE']] = row['TRANSLATION']
    else:
        trend_dict[row['REFERENCE_CODE']] = row['TRANSLATION']
        j += 1

So essentially, I would like to have dict_ladder_now as one dictionary which contains all key, value pairs of everything below it until it reaches country_satis, and then a dict_country_satis as another.

sgerbhctim
  • 3,420
  • 7
  • 38
  • 60
  • 5
    you are essentially asking how to create variable names dynamically. Its never a good idea to do it. You can instead use a list or dictionary that contains your sub dictionary names as keys. – Paritosh Singh Feb 08 '19 at 19:00
  • @ParitoshSingh can you elaborate on what you mean? – sgerbhctim Feb 08 '19 at 19:01
  • Clarification: you did not missed [DataFrame.to_dict](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html)? – Alex Yu Feb 08 '19 at 19:03
  • @AlexYu yes, I am aware of that. Not applicable in my case. – sgerbhctim Feb 08 '19 at 19:04
  • @ParitoshSingh I totally get what you are saying now, attempting. – sgerbhctim Feb 08 '19 at 19:08
  • Can you post `df.head().to_dict('records')` as example of data? – Alex Yu Feb 08 '19 at 19:12
  • 2
    " so I need to use dynamic dict names" No, you do not. Do not use dynamic variable names. Just use a *container*, like a list or a dict. – juanpa.arrivillaga Feb 08 '19 at 19:33
  • Dynamic variable names are *possible*, but they lead to pain and madness. See https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables for alternatives, and http://stupidpythonideas.blogspot.com/2013/05/why-you-dont-want-to-dynamically-create.html for more info about why it's a bad idea. – PM 2Ring Feb 08 '19 at 19:37

1 Answers1

1

Instead of trying to generate a dynamic number of variable names on the fly, You should instead opt for a different higher level data-structure to store your objects, such as a dictionary or a list.

import pandas as pd
REFERENCE_CODE = ["ladder_now", 0, 1, 5, 15, "country_satis", 20, 50, 100, "test3", 10, 50, 90]
TRANSLATION = list(range(len(REFERENCE_CODE)))
df = pd.DataFrame({"REFERENCE_CODE": REFERENCE_CODE,
                   "TRANSLATION": TRANSLATION
                   })
print(df)
#Output: Dummy data prepared for reference
   REFERENCE_CODE  TRANSLATION
0      ladder_now            0
1               0            1
2               1            2
3               5            3
4              15            4
5   country_satis            5
6              20            6
7              50            7
8             100            8
9           test3            9
10             10           10
11             50           11
12             90           12

Using a list: Using a list, and the logic written in the original question

result = [] #container list that grows dynamically
for index, row in df.iterrows():
    j = 0
    if isinstance(row['REFERENCE_CODE'], str):
        if j == 0:
            # fix dynamically changing dict here
            result.append({}) #new dictionary in container
            result[-1][row['REFERENCE_CODE']] = row['TRANSLATION']
        else:
            j = 0
            # create new dynamically named dictionary
            result.append({}) #new dictionary in container
            result[-1][row['REFERENCE_CODE']] = row['TRANSLATION']
    else:
        result[-1][row['REFERENCE_CODE']] = row['TRANSLATION']
        j += 1

Note that the way the logic is written, you can simplify it to the following. the j variable is not being used, and the same line of code is written in almost every block. That ends up with something like this:

result = []      
for index, row in df.iterrows():
    if isinstance(row['REFERENCE_CODE'], str):
        result.append({})
    result[-1][row['REFERENCE_CODE']] = row['TRANSLATION']

print(result)
#Output:
[{'ladder_now': 0, 0: 1, 1: 2, 5: 3, 15: 4},
 {'country_satis': 5, 20: 6, 50: 7, 100: 8},
 {'test3': 9, 10: 10, 50: 11, 90: 12}]

Using a dict: A dictionary container may be nicer here, as you can refer to the sub dictionaries by name.

result_dict = {}
for index, row in df.iterrows():
    if isinstance(row['REFERENCE_CODE'], str):
        key = row['REFERENCE_CODE']
        result_dict[key] = {}
    result_dict[key][row['REFERENCE_CODE']] = row['TRANSLATION']
print(result_dict)
#Output:
{'ladder_now': {'ladder_now': 0, 0: 1, 1: 2, 5: 3, 15: 4},
 'country_satis': {'country_satis': 5, 20: 6, 50: 7, 100: 8},
 'test3': {'test3': 9, 10: 10, 50: 11, 90: 12}}

Note that you may want to further modify the logic of the if block, especially since i am not sure you would want the string keys to reappear inside the sub dictionary. However, this should give you an idea of how you can tackle creating dynamic number of items.

Paritosh Singh
  • 6,034
  • 2
  • 14
  • 33