I have this Data Science problem where I need to create a test set using info provided in two csv files.
Problem
data1.csv
cat,In1,In2
aaa, 0, 1
aaa, 2, 1
aaa, 2, 0
aab, 3, 2
aab, 1, 2
data2.csv
cat,index,attribute1,attribute2
aaa, 0, 150, 450
aaa, 1, 250, 670
aaa, 2, 30, 250
aab, 0, 60, 650
aab, 1, 50, 30
aab, 2, 20, 680
aab, 3, 380, 250
From these two files what I need is a updated data1.csv file. Where in place of In1 and In2, I need the attributes of the specific indices(In1 and In2), under a specific category (cat).
Note: All the indices in a specific category (cat) have their own attributes.
Result should look like this,
updated_data1.csv
cat,In1a1,In1a2,In2a1,In2a2
aaa, 150, 450, 250, 670
aaa, 30, 250, 250, 670
aaa, 30, 250, 150, 450
aab, 380, 250, 20, 680
aab, 50, 30, 20, 680
I need an approach to tackle this problem using pandas in python. So far I have loaded the csv files in to my jupyter notebook. And I have no clue where to start.
Please note this is my first week using python for data manipulation and I have a very little knowledge on python. Also pardon me for ugly formatting. I'm using the mobile phone to type this question.