0

Given the following df with fruits and their prices by month, I'd like to have all of the fruits listed in a single Fruit_Month column and then have another column called Prices. The ultimate goal is to calculate the correlation between fruit prices.

Given:

Fruit     Jan    Feb
Apple    2.00   2.50
Banana   1.00   1.25

Desired output:

Fruit_Month     Price
Apple_Jan       2.00  
Apple_Feb       2.50
Banana_Jan      1.00
Banana_Feb      1.25

And then from here, I'd like to see how correlated each fruit is with one another. In this simple example, it'd just be Apple vs Banana, but it should apply if there were more fruits. If there's a better/easier way, please let me know.

Primer4662
  • 23
  • 4

1 Answers1

0

Here is an approach that first melts the table to make the Month row, then makes a new df using the melted columns. I bet there are more clever ways to do this, maybe with unstack. Maybe depending on what you need to do, it will be easier to keep Fruit and Month as separate columns.

df = df.melt(id_vars='Fruit',var_name='Month',value_name='Price')

df = pd.DataFrame({
    'Fruit_Month': df.Fruit+'_'+df.Month,
    'Price': df.Price
})
mitoRibo
  • 4,468
  • 1
  • 13
  • 22