0

I need to make a bar plot of the following DataFrame:

df = pd.DataFrame(
    {
        "Months": [1, 2, 2, 4],
        "Value": ["1000", "2000", "1500", "3200"],
    }
)

The goal is to have a bar plot with Months on the x-axis and combined values from Value column on the y-axis (so in this case that would mean that only month 2 will be combined, essentially merging the rows where months are the same).

I also need to convert the numerical months to their text counterparts (so 1 becomes January etc.) and keep them in the correct order on the x-axis (so January would be first, February second and so on).

Thanks in advance.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
TRM
  • 117
  • 7

1 Answers1

0

If I understand you correctly you should do the following:

  1. Cast Value column to numerical type:

    df["Value"] = pd.to_numeric(df["Value"])
    
  2. Group by months and sum Value columns for each month, lastly sorting by their numerical values (if you had "Months": [1, 8, 2, 2, 5, 5, 1] for example):

    df = df.groupby("Months", as_index=False).sum().sort_values("Months")
    
  3. Months are sorted, so you can change their names (from int to str form):

    df["Months"] = pd.to_datetime(df["Months"], format="%m").dt.month_name()
    

    This operation would yield the following pd.DataFrame:

         Months  Value
    0   January   1000
    1  February   3500
    2     April   3200
    
  4. Set index to Months (so it becomes x-axis of your plot) and use pandas plotting abilities.

    df.set_index("Months").plot.bar()
    

This operation yields following bar plot:

enter image description here

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
  • Thank you very much. Could you also help me with the following: I want to make the `Value` an average, so in that example the average `Value` of `February` would be devided by `2`. This needs to be shown in a bar plot as well. @Szymon Maszke – TRM Mar 09 '19 at 17:11
  • Can you explain why `as_index=False` is needed? And can it be done without? @Szymon Maszke – TRM Mar 10 '19 at 12:55
  • Otherwise `Months` column would become index of `DataFrame` and I wouldn't be able to modify it as column (cast numbers to months like `January` etc.). Step three would be harder to perform/less readable this way IMO and I'm not sure whether this approach could be used. [See here](https://stackoverflow.com/questions/40427943/how-do-i-change-a-single-index-value-in-pandas-dataframe) for modifying index, though this operation is more complicated and more pandas dependent. – Szymon Maszke Mar 10 '19 at 19:36