Group similar data and assign number to each group Python pandas

Question

I have a dataset having 25 columns and 1000+ rows. This dataset contains dummy information of interns. We want to make squads of these interns. Suppose we want to make each squad of 10 members. Based on the similarities of the intern we want to make squads and assign squad number to them. The factors will the columns we have in dataset which are Timezone, Language they speak, in which team they want to work etc.

These are the columns:

["Name","Squad_Num","Prefered_Lang","Interested_Grp","Age","City","Country","Region","Timezone",
 "Occupation","Degree","Prev_Took_Courses","Intern_Experience","Product_Management","Digital_Marketing",
 "Market_Research","Digital_Illustration","Product_Design","Prodcut_Developement","Growth_Marketing",
 "Leading_Groups","Internship_News","Cohort_Product_Marketing","Cohort_Product_Design",
 "Cohort_Product_Development","Cohort_Product_Growth","Hours_Per_Week"]

enter image description here

Attached link is the image to the data table showing few row — rafay-mahmood, Sep 23 '22 at 09:38
you must provide a fully [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and the matching expected output — mozway, Sep 23 '22 at 09:41
Please [do not post images of code, data, error messages, etc.](https://stackoverflow.com/help/how-to-ask), add the information as text (within code fences etc.) instead. — Timus, Sep 23 '22 at 11:00
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Sep 23 '22 at 11:25
Suppose there are 100 rows in a table having data of intern Intern A | Speaks English | Region = UK | Interests = Product Development | etc, etc Intern B | Speaks English | Region = USA | Interests = Product Development | etc, etc Intern C | Speaks German | Region = USA | Interests = Product Marketing | etc, etc Inter D, E, F and 1000+ rows. I want to make squads/groups out of this data. I want to identify that which interns should we put in a same group, such that they are comfortable in each way communicating, they are in same interests according to what suits them best — rafay-mahmood, Sep 23 '22 at 17:25

score 0 · Answer 1 · answered Sep 24 '22 at 23:44

Here are a bunch of clustering algos for you to play around with.

https://github.com/ASH-WICUS/Notebooks/blob/master/Clustering%20Algorithms%20Compared.ipynb

Since this is unsupervised learning, you kind of have to fiddle around with different algos, and see which one performs to your liking, but there is no accuracy, precision, R^2, etc., to let you know how well the machine is performing.

Group similar data and assign number to each group Python pandas

1 Answers1