I want to rewrite below for loop written in R into Pyspark.
for (i in unique(fix_map[!is.na(area)][order(area), area])) {
# select all contact records from the currently processed area, and also those without area assigned
m_f_0 <- unique(con_melt[area == i | area == "Unknown"])
con_melt also has value as "Unknown"
So I want to select common records which are present in fix_map and con_melt based on "area" "AND" con_melt records for which column 'area' value is also "Unknown".
I tried using join in pyspark, but then I am loosing on value "Unknown".
Please suggest how to handle this
fix_map:
id value area type
1: 227149 385911000059 510 mob
2: 122270 385911000661 110 fix
con_melt:
id area type
1: 227149 510 mob
2: 122270 100 fix
3. 122350 Unknown fix
Ouput should be :
value area type
1: 385994266007 510 mob
2: 122350 Unknown fix