0

enter image description here

I have a file and the values of a column are repetitive and have their amounts.

Now how can I draw a linear graph of the values?

I did this, but it didn't work.

import matplotlib.pyplot as plt
import pandas as pd

data = {'location': ['Afghanistan'] * 5 + ['Africa'] * 4, 'new_cases': [3, 0, 0, 3, 6, 0, 1, 0, 0]}
newData = pd.DataFrame(data)

fig, ax = plt.subplots(figsize=(15,7))
byLoc = newData.groupby('location').count()['new_cases'].unstack().plot(ax=ax)

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [141], line 2
      1 fig, ax = plt.subplots(figsize=(15,7))
----> 2 byLoc = newData.groupby('location').count()['new_cases'].unstack().plot(ax=ax)

File ~\anaconda3\envs\py11\Lib\site-packages\pandas\core\series.py:4455, in Series.unstack(self, level, fill_value)
   4412 """
   4413 Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
   4414 
   (...)
   4451 b    2    4
   4452 """
   4453 from pandas.core.reshape.reshape import unstack
-> 4455 return unstack(self, level, fill_value)

File ~\anaconda3\envs\py11\Lib\site-packages\pandas\core\reshape\reshape.py:483, in unstack(obj, level, fill_value)
    478         return obj.T.stack(dropna=False)
    479 elif not isinstance(obj.index, MultiIndex):
    480     # GH 36113
    481     # Give nicer error messages when unstack a Series whose
    482     # Index is not a MultiIndex.
--> 483     raise ValueError(
    484         f"index must be a MultiIndex to unstack, {type(obj.index)} was passed"
    485     )
    486 else:
    487     if is_1d_only_ea_dtype(obj.dtype):

ValueError: index must be a MultiIndex to unstack, <class 'pandas.core.indexes.base.Index'> was passed
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158

2 Answers2

2
  • Pivot the DataFrame, and then align the indices by dropping NaN values and "compressing" the pivoted columns, which is shown in this answer.
  • Tested in python 3.11, pandas 1.5.3, matplotlib 3.7.0

Imports and DataFrame

import pandas as pd

df = pd.DataFrame({'location': ['Afghanistan'] * 5 + ['Africa'] * 4, 'new_cases': [3, 0, 0, 3, 6, 0, 1, 0, 0]})

Plot the New Cases

# pivot and drop nan
dfp = df.pivot(columns='location', values='new_cases').apply(lambda x: pd.Series(x.dropna().values))

# plot
ax = dfp.plot(figsize=(8, 6), title='New Cases', xticks=dfp.index)

enter image description here

Plot the Cumulative New Cases

# add a cumulative column
df['cumulative'] = df.groupby('location').new_cases.transform('cumsum')

# pivot and drop nan
dfp = df.pivot(columns='location', values='cumulative').apply(lambda x: pd.Series(x.dropna().values))

# plot
ax = dfp.plot(figsize=(8, 6), title='New Cases', xticks=dfp.index)

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
0

For draw line graph you have to use time attribute. I assume you have time attribute and this line show new cases through time.

import matplotlib.pyplot as plt


plt.plot(df['time'], df['new_cases'])
plt.title('New Cases over Time')
plt.xlabel('Time')
plt.ylabel('New Cases')
plt.show()

For showing relativity about new cases and locations you can use bar chart which is more propriate.

Elkhan
  • 360
  • 2
  • 15