0

I have a 1st CSV panda dataframe like this:

            Date_reported  Country   New_cases  Cumulative_cases  
     0       23-Feb-20    Singapore      10             10
     0       26-Mar-20    Singapore      100            110
     0       27-Apr-20    Singapore      200            310
     0       28-May-20    Singapore      400            710
     0       30-Jun-20    Singapore      1000           1710

And I managed to churn out a grouped data of a monthly case in this form:

    Date_reported
    February      10
    March        100
    April        200
    May          400
    June        1000
 

Using this code:

  df = pd.read_csv('Stackoverflow1.csv')
  df['Date_reported'] = pd.to_datetime(df['Date_reported'])
  df.groupby(df['Date_reported'].dt.strftime('%B'))['New_cases'].sum().sort_values()  
   

I also have a 2nd panda dataframe like this:

  Variables       2020 Feb  2020 Mar  2020 Apr  2020 May  2020 June   
 Hotel Revenue       $100      $90      $80       $70       $60
 Occupancy Rate       80%       70%     60%       50%        40%

My end result is to merge the data on the monthly reported cases from the first dataframe and variables from on the second dataframe (i.e. Monthly covid cases against Hotel Revenue) based on individual months and plot a simple graph.

i.e.

   Date_reported    Monthly_cases   Hotel Revenue Occupancy Rate
February                  10           $100           80%
March                     100          $90            70%
April                     200          $80            60%
May                       400          $70            50%
June                      1000         $60            40%

However, I am stuck at two problems:

  1. I cannot convert the grouped data on the monthly cases itself as a list to plot against a variable from the second data frame although I managed to churn out the monthly summary.

  2. Although I managed to transpose the 2nd dataframe using this code,

       hotel = pd.read_csv('Stackoverflow2.csv')
       hotel2 = hotel.T
       headers = hotel2.iloc[0]
       hotel2 = pd.DataFrame(hotel2.values[1:], columns=headers)
    

I do not have the dates as an index shown below:

Variables       Hotel Revenue  Occupancy Rate      
 2020 Feb           $100           80%
 2020 Mar           $90            70%
 2020 Apr           $80            60%
 2020 May           $70            50%
 2020 Jun           $60            40%

Your help would be greatly appreciated as I am a beginner struggling with pandas and python. Do let me know if you have a better way of presenting this. Thank you.

Leonard
  • 53
  • 5
  • 1
    Please provide the expected [MRE - Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Show where the intermediate results deviate from the ones you expect. We should be able to paste a single block of your code into file, run it, and reproduce your problem. This also lets us test any suggestions in your context. [Include your minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of the example. – Prune Feb 15 '21 at 16:52
  • 1
    Please go through the [intro tour](https://stackoverflow.com/tour), the [help center](https://stackoverflow.com/help) and [how to ask a good question](https://stackoverflow.com/help/how-to-ask) to see how this site works and to help you improve your current and future questions, which can help you get better answers. You're getting quite close. However, you get only *one* question per posting. Also, it seems that your root problem is simply needing more practice, or perhaps a local tutor of some sort. Stack Overflow is not intended to replace existing tutorials and documentation. – Prune Feb 15 '21 at 16:53

0 Answers0