0

I am using pandas and matplotlib to generate some charts.

My DataFrame:

                                             Journal    Papers per year in journal
0                Information and Software Technology    4
1  2012 International Conference on Cyber Securit...    4
2       Journal of Network and Computer Applications    4 
3                            IEEE Security & Privacy    5
4                               Computers & Security    11

My Dataframe is a result of a groupby out of a larger dataframe. What I want now, is a simple barchart, which in theory works fine with a df_groupby_time.plot(kind='bar'). However, I get this:

enter image description here

What I want are different colored bars, and a legend which states which color corresponds to which paper.

enter image description here

Playing around with relabeling hasn't gotten me anywhere so far. And I have no idea anymore on how to achieve what I want.


EDIT:

Resetting the index and plotting isn't what I want: df_groupby_time.set_index("Journals").plot(kind='bar')

enter image description here

Martin Müsli
  • 1,031
  • 3
  • 14
  • 26

1 Answers1

0

I found a solution, based on this question here. SO, the dataframe needs to be transformed into a matrix, were the values exist only on the main diagonal. First, I save the column journals for later in a variable. new_cols = df["Journal"].values

Secondly, I wrote a function, that takes a series, the column Papers per year in Journal, and the previously saved new columns, as input parameters, and returns a dataframe, where the values are only on the main diagonal.:

def values_into_main_diagonal(some_series, new_cols):
    """Puts the values of a series onto the main diagonal of a new df.
       some_series - any series given
       new_cols - the new column labels as list or numpy.ndarray"""
    x = [{i: some_series[i]} for i in range(len(some_series))]
    main_diag_df = pd.DataFrame(x)
    main_diag_df.columns = new_cols
    return main_diag_df

Thirdly, feeding the function the Papers per year in Journal column and our saved new columns names, returns the following dataframe:

new_df:

   1_journal  2_journal  3_journal  4_journal  5_journal
0  4          NaN        NaN        NaN        NaN
1  NaN        4          NaN        NaN        NaN
2  NaN        NaN        4          NaN        NaN
3  NaN        NaN        NaN        5          NaN
4  NaN        NaN        NaN        NaN        11

Finally plotting the new_df via new_df.plot(kind='bar', stacked=True) gives me what I want. The Journals in different colors as the legend and NOT on the axis.: enter image description here

Martin Müsli
  • 1,031
  • 3
  • 14
  • 26