0

There are two CSV files having same following schema, generated a month apart FILE20221105.csv and FILE20221205.csv.

Key:Int
Val1:String
Val2:String
Val3:String
Val4:String

Some of the Key will have Val column with string “N/A”. I want to find percentage of “N/A” in each column. I can do that the following way after reading CSV file in the dataframe say df

((df.drop([‘Key’], axis=1))==”N/A”).mean()

I will get say something like this (Example)

Val1:0.33
Val2:0.25
Val3:0.1
Val4:0.2

But real ask has one more condition that I am trying to do without a loop The requirement is to find percentage of “N/A” in each Val column in current month but not having “N/A” last month. Both CSV are joined based on column Key (Inner join). Column Key has unique value.

**Example**

FILE20221105.csv (All row with N/A in this month will not be considered)

Key    Val1    Val2    Val3    Val4
101    A1      B1      C1      N/A
102    N/A     B2      C2      D2
103    A3      N/A     C3      D3
104    A4      B4      N/A     D4
105    A5      B1      C1      N/A
106    A6      B2      C2      D2
107    A7      N/A     C3      D3
108    A8      B4      C4      D4


FILE20221205.csv (All row which previously is Non N/A but N/A this month will be considered)

Key    Val1    Val2    Val3    Val4
101    A1      B1      C1      D1
102    A2      B2      C2      D2
103    A3      B3      C3      D3
104    A4      N/A     C4      D4
105    A5      N/A     C1      D1
106    A6      N/A     C2      N/A
107    N/A     B3      C3      N/A
108    N/A     B4      N/A     D4

Output

Val1:5/7 (7 in denominator as 1 value in previous month row was N/A, in current month 2 out of remaining 7 are N/A therefore 5 in numerator)
Val2:3/6 (6 in denominator as 2 value in previous month row was N/A, in current month 3 out of remaining 6 are N/A therefore 3 in numerator)
Val2:6/7 (7 in denominator as 1 value in previous month row was N/A, in current month 1 out of remaining 7 are N/A therefore 5 in numerator)
Val2:4/6 (6 in denominator as 2 value in previous month row was N/A, in current month 2 out of remaining 6 are N/A therefore 4 in numerator)
  • please give a dummy example for the two files, and the matching expected output, the question in its current state is too ambiguous – mozway Jan 05 '23 at 12:19
  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Jan 05 '23 at 14:17

0 Answers0