There are two CSV files having same following schema, generated a month apart FILE20221105.csv and FILE20221205.csv.
Key:Int
Val1:String
Val2:String
Val3:String
Val4:String
Some of the Key will have Val column with string “N/A”. I want to find percentage of “N/A” in each column. I can do that the following way after reading CSV file in the dataframe say df
((df.drop([‘Key’], axis=1))==”N/A”).mean()
I will get say something like this (Example)
Val1:0.33
Val2:0.25
Val3:0.1
Val4:0.2
But real ask has one more condition that I am trying to do without a loop The requirement is to find percentage of “N/A” in each Val column in current month but not having “N/A” last month. Both CSV are joined based on column Key (Inner join). Column Key has unique value.
**Example**
FILE20221105.csv (All row with N/A in this month will not be considered)
Key Val1 Val2 Val3 Val4
101 A1 B1 C1 N/A
102 N/A B2 C2 D2
103 A3 N/A C3 D3
104 A4 B4 N/A D4
105 A5 B1 C1 N/A
106 A6 B2 C2 D2
107 A7 N/A C3 D3
108 A8 B4 C4 D4
FILE20221205.csv (All row which previously is Non N/A but N/A this month will be considered)
Key Val1 Val2 Val3 Val4
101 A1 B1 C1 D1
102 A2 B2 C2 D2
103 A3 B3 C3 D3
104 A4 N/A C4 D4
105 A5 N/A C1 D1
106 A6 N/A C2 N/A
107 N/A B3 C3 N/A
108 N/A B4 N/A D4
Output
Val1:5/7 (7 in denominator as 1 value in previous month row was N/A, in current month 2 out of remaining 7 are N/A therefore 5 in numerator)
Val2:3/6 (6 in denominator as 2 value in previous month row was N/A, in current month 3 out of remaining 6 are N/A therefore 3 in numerator)
Val2:6/7 (7 in denominator as 1 value in previous month row was N/A, in current month 1 out of remaining 7 are N/A therefore 5 in numerator)
Val2:4/6 (6 in denominator as 2 value in previous month row was N/A, in current month 2 out of remaining 6 are N/A therefore 4 in numerator)