I have a small df
that consists of two columns with a description and a value:
description| value|
+--------------+--------------------+
| PED_tobacco| 0.4|
|PED_nontobacco| 1.49|
| GMI| 17590.8855333196|
| CMO_NGP| 53389.0|
| A| 80.3|
| SC_TT| -0.146|
| SC_THP| -0.056|
| SC_ENDS| -0.007|
| SC_CF_PD| -0.002|
| SC_CF_FF| -0.031|
| CO2_comb| 1.23E-6|
| CO2_lighter|2.083000000000000...|
| Carbon_Cost| 114.0|
| PR_SDG12A| -0.05|
| PR_SDG12B| -0.01|
| PR_SDG3| 0.0|
| PR_SDG14| -0.27|
|EDEVICE_SDG12A| -0.01|
|EDEVICE_SDG12B| -0.05|
| EDEVICE_SDG3| -0.01|
+--------------+--------------------+
I have been trying to find a way to convert each row, in an independent defined variable, so that I can reference it directly. For example, I want to be able to say PED_tobacco * 10
, and get back 40
.
I tried converting it into a list
of dictionaries (at least that's how I can explain it with my python
background), using:
ass_dict = df_assumptions \
.rdd \
.map(lambda row: {row[0]: row[1]}) \
.collect()
# Which prints:
{'PED_tobacco': 0.4}, {'PED_nontobacco': 1.49}, {'GMI': 17590.8855333196}, {'CMO_NGP': 53389.0}, {'A': 80.3}, {'SC_TT': -0.146}, {'SC_THP': -0.056}, {'SC_ENDS': -0.007}, {'SC_CF_PD': -0.002}, {'SC_CF_FF': -0.031}, {'CO2_comb': 1.23e-06}, {'CO2_lighter': 2.0830000000000002e-08}, {'Carbon_Cost': 114.0}, {'PR_SDG12A': -0.05}, {'PR_SDG12B': -0.01}, {'PR_SDG3': 0.0}, {'PR_SDG14': -0.27}, {'EDEVICE_SDG12A': -0.01}, {'EDEVICE_SDG12B': -0.05}, {'EDEVICE_SDG3': -0.01}, {'EDEVICE_SDG14': 0.0}, {'TL_GL': 1.0}, {'TL_GR': 0.0}, {'EW_GL': 0.83}]
But I still can't access each variable independently them. In python
I do this using:
def convert_to_var(df):
desc = []
val = []
for i,row in df.iterrows():
desc.append(i)
val.append(row)
return dict(val)
val_dict = convert_to_var(IA)
globals().update(val_dict)
Is there a way to do the same in Spark? How can I get each description with it's a value as a separate variable to be called on directly? Thanks in advance.