0

In view of automation I would like to integrate a f-string in variable names and call in a loop those variables. More precisely, let one of the variables be:

structure_acronym = ['Ocx', 'M1C-S1C', 'AMY', 'MGE', 'STC','MGE', 'URL', 'CGE', 'DTH', 'MFC', 'DFC']

For each of these acronyms I have to build a data frame out of a larger data frame. The larger data frame contains in it's column names structure acronyms that are repeated. Each smaller data frame that I want to build will contain a subselection of columns will comprise a single acronym in its column names but which is repeated. Each of such data frames will go into a pipeline which leads to k-means clustering. The variables such as f'new_column_name_cluster_{structure}', f'{structure}_df' will be used in the pipeline. Here is a loop that I am building but which raises the exception invalid syntax pointing to the f-string.

names=larger_dataframe.columns
for structure in structure_acronym:
    f'new_column_name_cluster_{structure}' = ['ensembl_gene_id','gene_symbol']+[name for name in names if structure in name]
    f'{structure}_df' = larger_dataframe[f'new_column_name_cluster_{structure}']

Can somebody help me out to make the code run ? Thanks.

user249018
  • 505
  • 2
  • 5
  • 18
  • I do not find an answer to my question in that post. – user249018 Apr 18 '22 at 19:50
  • There is an answer, though, you could just edit `globals()` dictionary. And your usage of f-strings seems odd, as it should contain the whole string, e.g. `f'new_column_name_cluster_{structure}'`. – evtn Apr 18 '22 at 20:04
  • But if I were you I'd use a dictionary for that, it suits better than editing globals. – evtn Apr 18 '22 at 20:05
  • Thanks. I can not understand the other post. I am new to python. By applying the f-string to the whole string, following your suggestion, I get the exception:SyntaxError: cannot assign to f-string expression. – user249018 Apr 18 '22 at 20:12
  • yes, that's an invalid syntax, you should (actually shouldn't, but still) edit globals(): `globals()[f'new_column_name_cluster_{structure}'] = # your value here` – evtn Apr 18 '22 at 20:14
  • 3
    I suggest you to create a dictionary for your values and use it instead of creating variables – evtn Apr 18 '22 at 20:15
  • 3
    As the linked duplicate suggests, you really shouldn't be using variable names as data. If you want to use strings to distinguish between different data items, use a dictionary with the strings as keys. – Blckknght Apr 18 '22 at 20:15
  • Thanks for your effort but I do not see how to handle this. – user249018 Apr 18 '22 at 20:29
  • Beside the point, but `M1C-S1C` is not a valid variable name. – wjandrea Apr 18 '22 at 20:41
  • 1
    @user249018 We're talking about something like `dfs = {}; for structure in structure_acronym: new_column_name_cluster = ...; dfs[structure] = larger_dataframe[new_column_name_cluster]`. I'm assuming your `f'{structure}_df' = ...` line is supposed to be indented and you don't need to keep `new_column_name_cluster`. If it's still not clear, please [edit] the question to clarify why, like "[this solution] was suggested to me but I don't get it because [x]". – wjandrea Apr 18 '22 at 20:47
  • This is not an issue. One can in place of that variable name type in `M1C_S1C`. – user249018 Apr 18 '22 at 20:48
  • The last line of the code above generates a data frame based on the larger data frame and the subcollection of the columns containing a single structure acronym. I also edited little bit the post. – user249018 Apr 18 '22 at 20:57

0 Answers0