2

I have one dataframe in the following form:

df = pd.read_csv('data/original.csv', sep = ',',  names=["Date", "Gran", "Country", "Region", "Commodity", "Type", "Price"], header=0)

I'm trying to do a self join on the index Date, Gran, Country, Region producing rows in the form of Date, Gran, Country, Region, CommodityX, TypeX, Price X, Commodity Y, Type Y, Prixe Y, Commodity Z, Type Z, Price Z

Every row should have all the different commodities and prices of a specific region.

Is there a simple way of doing this?

Any help is much appreciated!


Note: I simplified the example by ignoring a few attributes

Input Example:

      Date         Country  Region            Commodity  Price
1     03/01/2014   India  Vishakhapatnam      Rice       25
2     03/01/2014   India  Vishakhapatnam      Tomato     30
3     03/01/2014   India  Vishakhapatnam      Oil        50
4     03/01/2014   India  Delhi               Wheat      10   
5     03/01/2014   India  Delhi               Jowar      60   
6     03/01/2014   India  Delhi               Bajra      10   

Output Example:

      Date         Country  Region            Commodit1 Price1 Commodity2 Price2 Commodity3 Price3
1     03/01/2014   India  Vishakhapatnam      Rice       25     Tomato    30     Oil        50
2     03/01/2014   India  Delhi               Wheat      10     Jowar     60     Bajra      10
user3551674
  • 61
  • 1
  • 5
  • 1
    Can you show an example of the data you have and what the expected output would be? And have you looked at the `merge` and `join` functions (http://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging)? – joris Apr 19 '14 at 13:34
  • @joris I did have a look at pandas joins but that didn't help me much. I'm currently trying to rewrite my problem in sql, since I'm new to pandas this might help me get my head around the problem. I added an example so you can see what I'm trying to achive. Thanks for you help and time! – user3551674 Apr 19 '14 at 13:58

1 Answers1

1

What you want to do is called a reshape (specifically, from long to wide). See this answer for more information.

Unfortunately as far as I can tell pandas doesn't have a simple way to do that. I adapted the answer in the other thread to your problem:

df['idx'] = df.groupby(['Date','Country','Region']).cumcount()
df.pivot(index= ['Date','Country','Region'], columns='idx')[['Commodity','Price']]

Does that solve your problem?

Community
  • 1
  • 1
exp1orer
  • 11,481
  • 7
  • 38
  • 51
  • the post you sent me is exactly what I'm looking for. Thank you so much! However when trying to implement your suggested code I got the following error: ValueError: Length mismatch: Expected axis has 19918 elements, new values have 3 elements. Do you have any idea what the mistake might be? I know that the 19918 are the numbers of rows that my table has in the input form. – user3551674 Apr 21 '14 at 15:52