0

I have the following problem:

I want to total up all the case numbers of each state in the following file and display them in a new table.

for example, in the file from each day in 2020/2021 there are the new positive corona cases from the state of alabama.

thanks to a user in stackflow, i have the number of all new cases in 2020 using the code :

total_sum2 = 0

for overall_outcome, dt, new_results_reported in zip(America2['overall_outcome'].values.tolist(),America2['date'].values.tolist(), America2['new_results_reported'].values.tolist()):
  ndt = int(str(dt)[:4])
  if (overall_outcome == 'Positive') and (ndt == 2020):
    total_sum2 += int(new_results_reported)

print(total_sum2)

state_name object overall_outcome object date object new_results_reported int64 total_results_reported int64 dtype: object

can be determined.

How can I now automatically determine the numbers for each state and then compare them graphically and display them in a separate table?

I would be very grateful for your help

enter image description here

RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26
Work_fl0w
  • 21
  • 4

1 Answers1

0

You can create a pivot table withstate_name as index and overall_outcome as columns, while taking the sum of new_results_reported:

import pandas as pd

America2 = pd.read_csv('https://app.mipasa.com/download/759fc431-8bed-4dc1-aae0-b6d58484d4ec')
pivot_df = America2.pivot_table(index='state_name', columns='overall_outcome', values='new_results_reported', aggfunc='sum')

This will result in:

state_name Inconclusive Negative Positive
Alabama 8575 3.15222e+06 454148
Alaska 1636 1.52321e+06 91735
Arizona 9996 4.73217e+06 737905
Arkansas 4471 2.05751e+06 213053
California 190205 3.28396e+07 3.21527e+06

You can now visualize the data in many ways. For example a simple bar chart of the positive cases:

pivot_df['Positive'].plot(kind='bar')

PS. you can also get the total number of new cases directly with pandas (rather than using a for loop), for example by filtering the dataframe and getting the sum of new_results_reported:

America2[(America2['overall_outcome']=='Positive') & (America2['date'].str.contains('2020'))]['new_results_reported'].sum()
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26
  • Hey thanks for your answer. Does it work Even if the data from every state is on a daily base ? And how can i know Compare them in a graph. Thank you very much – Work_fl0w Jun 02 '21 at 20:47
  • how can i compare each state now with the new_results_reported ? and how can i compare positive negative and inconc – Work_fl0w Jun 02 '21 at 20:58
  • or another question my data is based on daily base per state how can i show for example all new_results_reported in alabama and all other states on a monthly based ? sorry for all the questions – Work_fl0w Jun 02 '21 at 21:01
  • The date column is not taken into account, it simply takes the sum all rows in the dataframe. And you can create all kinds of plots, `pivot_df.plot(kind='bar', stacked=True, figsize=(15, 5))` will give a stacked bar chart of everything, for example. Check [this](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html) page and apply the examples to your own dataframe. Month-based is a different issue, which requires parsing the date to a datetime object, then [grouping](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Grouper.html) by month – RJ Adriaansen Jun 02 '21 at 21:08
  • thank you for your help, do you have a link to do parsing the date to a datetime object, then grouping by month aswell? or an idea how i can implement this to my data sheet? – Work_fl0w Jun 02 '21 at 21:12
  • and again thank you so much for the pivot trick it makes my life much easier :D – Work_fl0w Jun 02 '21 at 21:13
  • Sure, you can take a look at [this](https://stackoverflow.com/a/44908600/11380795) example. If my answer solved your question, please [accept](https://stackoverflow.com/help/someone-answers) it to close this issue. – RJ Adriaansen Jun 02 '21 at 21:28
  • @Work_fl0w: as we talked previously, the last line of RJ Adriaansen answer, is the way you should filter data within dataframes. Best and faster way to go. Once filtered you just have to display it. – Malo Jun 02 '21 at 21:41