I am looking at some sample data such as this:
Data:
ID Name ParValue Coupon Maturity Issuer Moodys S&P_Fitch Grade Risk
37833100 Apple_Inc. 1049 95 2030 Apple_Inc. Aaa AAA Investment Highest_Quality
02079K107 Alphabet_Inc. 1055 99 2030 Alphabet_Inc. Aa AA Investment High_Quality
11659109 Alaska_Air_Group 996 98 2030 Alaska_Air_Group A A Investment Strong
931142103 Walmart_Stores,_Inc. 1195 99 2030 Walmart_Stores,_Inc. Baa BBB Investment Medium_Grade
495734523 Corp._Takeover 1108 97 2021 Corp._Takeover Ba,_B BB,_B Junk Speculative
193467211 Toys_R_Us 1109 105 2021 Toys_R_Us Caa/Ca/C CCC/CC/C Junk Highly_Speculative
576300972 Enron 1062 102 2021 Enron C D Junk In_Default
983457823 Economic_Consultants_Inc. Economic_Consultants_Inc. Baa BBB Investment Medium_Grade
894652378 Forecast_Backtesters_Corp. Forecast_Backtesters_Corp. Aaa AAA Investment Highest_Quality
Image:
So, if WalMart has Baa, BBB, Investment, and Medium_Grade (for Moodys, S&P_Fitch, Grade, and Risk) and Economic_Consultants_Inc. has these same attributes, I can know that Economic_Consultants_Inc. has 1195, 99, and 2030 (for ParValue, Coupon, Maturity), even though these data points are missing.
This is probably a KNN problem, but I'm thinking K-Means could be useful too. Basically, I'm trying to figure out how to update missing data points (ParValue, Coupon, & Maturity), like the ones colored pink in the image above, based on similar attributes. Then, I want to group similar items together (K-Means problem). Has someone here come across a good online example of how to do this? I looked online today and found some examples using randomly generated numbers, but my data sets will NOT have randomly generated numbers. I would appreciate any insight into how to solve this problem.