Main objective: Find cities that sell toys the most different from one another (top 10 differentials). For example Los Angeles sells the most Toys 3 and 4 and the city most opposite of that would be Salt Lake City, which sells Toy 9 and 15 the most and Toys 3 and 4 the least.
I have a CSV that I have put in a dataframe.
It has hundreds of rows currently and each row has 15 columns... Example:
City | Toy1 | Toy2 | Toy3 | ToyN |
---|---|---|---|---|
Los Angeles | 15 | 20 | 1 | 44 |
Miami | 33 | 2 | 545 | 15 |
Dallas | 111 | 222 | 545 | 448 |
City N | 15 | 555 | 44 | 987 |
So I need Los Angeles to compare Toy1 to all other cities, Toy2, through ToyN. And then so on for each city against the rest of the rows in the dataframe.
I am having trouble structuring this as I need a calculation difference on every column and doing a comparison between each city.
Expected Output: A new column with a difference score for City vs City. Example: |City|Toy1|Toy2|Toy3|ToyN|DiffMiami|Diff Dallas| |----|----|----|----|----|----|----| |Los Angeles|15|20|1|44|-17|15|
I have been trying to use DataFrame.diff() but not sure how to structure to use it in this scenario. Any suggestions would be gladly taken. Thanks.