0

This is what transaction matrix (dataframe) looks like:

{'Avg. Winter temp 0-10C': {0: 1.0, 1: 1.0},
 'Avg. Winter temp < 0C': {0: 0.0, 1: 0.0},
 'Avg. Winter temp > 10C': {0: 0.0, 1: 0.0},
 'Avg. summer temp 11-20C': {0: 0.0, 1: 1.0},
 'Avg. summer temp 20-25C': {0: 1.0, 1: 0.0},
 'Avg. summer temp > 25C': {0: 0.0, 1: 0.0},
 'GENDER_DESC:F': {0: 0.0, 1: 1.0},
 'GENDER_DESC:M': {0: 1.0, 1: 0.0},
 'MODEL_TYPE:FED EMP': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:HCPROV': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:IPA': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:MED A': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:MED ADVG': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:MED B': {0: 1.0, 1: 0.0},
 'MODEL_TYPE:MED SNPG': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:MED UNSP': {0: 0.0, 1: 1.0},
 'MODEL_TYPE:MEDICAID': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:MEDICARE': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:PPO': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:TPA': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:UNSPEC': {0: 0.0, 1: 0.0},
 'MODEL_TYPE:WORK COMP': {0: 0.0, 1: 0.0},
 'Multiple_Cancer_Flag:No': {0: 1.0, 1: 1.0},
 'Multiple_Cancer_Flag:Yes': {0: 0.0, 1: 0.0},
 'PATIENT_AGE_GROUP 30-65': {0: 0.0, 1: 0.0},
 'PATIENT_AGE_GROUP 65-69': {0: 0.0, 1: 0.0},
 'PATIENT_AGE_GROUP 69-71': {0: 1.0, 1: 0.0},
 'PATIENT_AGE_GROUP 71-77': {0: 0.0, 1: 0.0},
 'PATIENT_AGE_GROUP 77-85': {0: 0.0, 1: 1.0},
 'PATIENT_LOCATION:ARIZONA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:CALIFORNIA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:CONNECTICUT': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:DELAWARE': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:FLORIDA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:GEORGIA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:IOWA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:KANSAS': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:KENTUCKY': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:LOUISIANA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:MARYLAND': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:MASSACHUSETTS': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:MICHIGAN': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:MINNESOTA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:MISSISSIPPI': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:MISSOURI': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:NEBRASKA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:NEW JERSEY': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:NEW MEXICO': {0: 1.0, 1: 0.0},
 'PATIENT_LOCATION:NEW YORK': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:OKLAHOMA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:OREGON': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:PENNSYLVANIA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:SOUTH CAROLINA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:TENNESSEE': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:TEXAS': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:VIRGINIA': {0: 0.0, 1: 0.0},
 'PATIENT_LOCATION:WASHINGTON': {0: 0.0, 1: 1.0},
 'PAYER_TYPE:Commercial': {0: 0.0, 1: 0.0},
 'PAYER_TYPE:Managed Medicaid': {0: 0.0, 1: 0.0},
 'PAYER_TYPE:Medicare': {0: 1.0, 1: 0.0},
 'PAYER_TYPE:Medicare D': {0: 0.0, 1: 1.0},
 'PLAN_NAME:ALL OTHER THIRD PARTY': {0: 0.0, 1: 0.0},
 'PLAN_NAME:BCBS FL UNSPECIFIED': {0: 0.0, 1: 0.0},
 'PLAN_NAME:BCBS MI MEDICARE D GENERAL (MI)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:BCBS TEXAS GENERAL (TX)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:BLUE CARE (MS)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:BLUE PREFERRED PPO (AZ)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:CMMNWLTH CRE MED SNP GENERAL(MA)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:DEPT OF VETERANS AFFAIRS': {0: 0.0, 1: 0.0},
 'PLAN_NAME:EMBLEMHEALTH/HIP/GHI UNSPEC': {0: 0.0, 1: 0.0},
 'PLAN_NAME:ESSENCE MED ADV GENERAL (MO)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:HEALTH NET MED D GENERAL (OR)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:HIGHMARK UNSPECIFIED': {0: 0.0, 1: 0.0},
 'PLAN_NAME:HUMANA MED D GENERAL(MN)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:HUMANA-UNSPECIFIED': {0: 0.0, 1: 0.0},
 'PLAN_NAME:KEYSTONE FIRST (PA)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE A': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE A KENTUCKY (KY)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE A MINNESOTA (MN)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B ARIZONA (AZ)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B IOWA (IA)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B KANSAS (KS)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B NEW MEXICO (NM)': {0: 1.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B PENNSYLVANIA (PA)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B TEXAS (TX)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE B VIRGINIA (VA)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MEDICARE UNSP': {0: 0.0, 1: 0.0},
 'PLAN_NAME:MOLINA HEALTHCARE (FL)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:OPTUMHEALTH PHYSICAL HEALTH': {0: 0.0, 1: 0.0},
 'PLAN_NAME:PACIFICSOURCE HP MED ADV GNRL': {0: 0.0, 1: 0.0},
 'PLAN_NAME:PAI PLANNED ADMIN INC (SC)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:PEOPLES HLTH NETWORK': {0: 0.0, 1: 0.0},
 'PLAN_NAME:THE COVENTRY CORP UNSPECIFIED': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (FL)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (MD)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (NY)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (TX)': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UHC/PAC/AARP MED D GENERAL (WA)': {0: 0.0, 1: 1.0},
 'PLAN_NAME:UNITED HLTHCARE-(CT) CT PPO': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UNITED HLTHCARE-(NE) MIDLANDS': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UNITED HLTHCARE-UNSPECIFIED': {0: 0.0, 1: 0.0},
 'PLAN_NAME:UNITED MEDICAL RESOURCES/UMR': {0: 0.0, 1: 0.0},
 'PLAN_NAME:WORKERS COMP - EMPLOYER': {0: 0.0, 1: 0.0},
 'PRI_SPECIALTY_DESC:DERMATOLOGY': {0: 0.0, 1: 0.0},
 'PRI_SPECIALTY_DESC:HEMATOLOGY/ONCOLOGY': {0: 1.0, 1: 0.0},
 'PRI_SPECIALTY_DESC:INTERNAL MEDICINE': {0: 0.0, 1: 0.0},
 'PRI_SPECIALTY_DESC:MEDICAL ONCOLOGY': {0: 0.0, 1: 1.0},
 'PRI_SPECIALTY_DESC:NURSE PRACTITIONER': {0: 0.0, 1: 0.0},
 'PRI_SPECIALTY_DESC:OBSTETRICS & GYNECOLOGY': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:ARIZONA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:CALIFORNIA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:CONNECTICUT': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:DELAWARE': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:FLORIDA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:IOWA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:KANSAS': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:KENTUCKY': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:LOUISIANA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:MASSACHUSETTS': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:MICHIGAN': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:MINNESOTA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:MISSISSIPPI': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:MISSOURI': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:NEBRASKA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:NEW MEXICO': {0: 1.0, 1: 0.0},
 'PROVIDER_LOCATION:NEW YORK': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:OREGON': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:PENNSYLVANIA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:SOUTH CAROLINA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:TENNESSEE': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:TEXAS': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:VIRGINIA': {0: 0.0, 1: 0.0},
 'PROVIDER_LOCATION:WASHINGTON': {0: 0.0, 1: 1.0},
 'PROVIDER_TYP_DESC:PROFESSIONAL': {0: 1.0, 1: 1.0},
 'Region:MIDWEST': {0: 0.0, 1: 0.0},
 'Region:NORTHEAST': {0: 0.0, 1: 0.0},
 'Region:SOUTH': {0: 0.0, 1: 0.0},
 'Region:WEST': {0: 1.0, 1: 1.0},
 'Vials Consumption == 1': {0: 0.0, 1: 0.0},
 'Vials_Consumption_GROUP 1-2': {0: 0.0, 1: 0.0},
 'Vials_Consumption_GROUP 12-91': {0: 0.0, 1: 0.0},
 'Vials_Consumption_GROUP 2-3': {0: 0.0, 1: 0.0},
 'Vials_Consumption_GROUP 3-6': {0: 0.0, 1: 1.0},
 'Vials_Consumption_GROUP 6-12': {0: 1.0, 1: 0.0},
 'keytruda_flag:No': {0: 1.0, 1: 1.0},
 'keytruda_flag:Yes': {0: 0.0, 1: 0.0},
 'libtayo_flag:No': {0: 0.0, 1: 0.0},
 'libtayo_flag:Yes': {0: 1.0, 1: 1.0},
 'optivo_flag:No': {0: 1.0, 1: 1.0},
 'optivo_flag:Yes': {0: 0.0, 1:

 0.0}}

This is a transactional matrix. Rules are created out of this using this:

from mlxtend.frequent_patterns import apriori
frequent_itemsets = apriori(train_bucket, min_support=0.2, use_colnames=True)
print (frequent_itemsets)

And create rules using this:

from mlxtend.frequent_patterns import association_rules
association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)
print (len(rules["antecedents"]))

It gives 10k rules. I need to be able to visualize these. I tried using this: https://intelligentonlinetools.com/blog/2018/02/10/how-to-create-data-visualization-for-association-rules-in-data-mining/

I tried the networkX example and it gives this:

NetworkX Example

If I plot all, it becomes cluttered.

I thought of applying t-SNE but that doesn't make quite sense to be used on initial transactional matrix. Tried it this way

import numpy as np
from sklearn.manifold import TSNE
X = train_bucket
X_embedded = TSNE(n_components=2).fit_transform(X)
X_embedded.shape
from sklearn.manifold import TSNE
from matplotlib import pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11.7,8.27)})
palette = sns.color_palette("bright", 10)
sns.scatterplot(X_embedded[:,0], X_embedded[:,1], legend='full', palette=palette)

I have no idea how to make sense of it. What are some options that I can explore?

willcrack
  • 1,794
  • 11
  • 20
  • A quick thing to look into is changing the layout function you're using to build the `pos` dict to arrange the nodes in the plot. Instead of `nx.spring_layout` you could try something like the `nx.bipartite_layout` to align your rule nodes on one side and items on the other side. You may have to rearrange the nodes within the sides to minimize edge crossings. If you can identify which items are commonly linked by rules (clustering?) you can put them closer together to minimize crossings. – cookesd Dec 31 '20 at 14:36
  • @cookesd that's a good idea. Let me try that – user14668919 Dec 31 '20 at 15:48
  • @cookesd what kind of libraries I can use to represent in a decision tree kind of format? – user14668919 Dec 31 '20 at 15:54
  • I'm not too familiar with other visualization packages, but networkx lets you define your own position dictionary so can maybe use a breadth first search to define the levels if your graph is acyclic. You may checkout graphviz and some of the packages from the answers here https://stackoverflow.com/q/7991138/13716967 – cookesd Jan 02 '21 at 12:42

0 Answers0