0

So I'm working on a project that uses the live COVID-19 data from JHU CSSE. I currently have three data frames, with each data frame representing the time series of confirmed cases, recovered cases, and confirmed deaths. Each data frame has rows that contain the following information: Province/State, Country/Region, and a column for each day since the beginning of the data recording, where the value is the running total for that dataset, be it confirmed, recovered, or deaths. Here is a sample of the dataset:

            Province/State       Country/Region       Lat        Long   1/22/20   ...   10/28/20 
                       NaN          Afghanistan  33.93911   67.709953         0            41145
                       NaN              Albania  41.15330   20.168300         0            20040    
                       NaN              Algeria  28.03390    1.659600         0            57026    
                       NaN              Andorra  42.50630    1.521800         0             4517    
                       NaN               Angola -11.20270   17.873900         0            10074    
                       NaN  Antigua and Barbuda  17.06080  -61.796400         0              124   
                       NaN            Argentina -38.41610  -63.616700         0          1130533    
                       NaN              Armenia  40.06910   45.038200         0            82651    
        Australian Capital            Australia -35.47350  149.012400         0              114              
           New South Wales            Australia -33.86880  151.209300         0             4411    

Each of the three data frames have the same number of rows and columns, in addition to the first two columns being identical.

Currently, if I wanted to get the number of confirmed, recovered, and deaths for a specific country on a specific date, I would need to go to each data frame and lookup that country and day.

In order to consolidate the tables and save on memory and performance, I'd like to merge the tables in such a way that the original values are kept. Ideally, rather than looking up the same cell three different times in three different data frames, I'd have one cell in one data frame that would give me all three values. So, for example, if I look up the USA on X day, I would get a list of integers such as [cases, deaths, recovered]. Is there any way to combine the tables in Pandas?

One way this could look is pulling a country is the following:

State,Country,1/22/20,1/23/20m,etc.

New York, USA, [confirmed, deaths, recovered], [confirmed, deaths, recovered], etc...
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28
bitwise
  • 3
  • 1
  • Please share your dataframe as text. image is not usable – Subbu VidyaSekar Oct 30 '20 at 00:27
  • `pd.concat(mylist_df)`? – ansev Oct 30 '20 at 00:32
  • These are the links to the live .csv files I am using for [confirmed cases](https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv), [confirmed deaths](https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv), and [recovered](https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv). I removed the coordinates, they're otherwise identical. – bitwise Oct 30 '20 at 00:35
  • If this question does not solve your doubts, you could try to copy and paste a reproducible example https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples . I recommend you read https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html – ansev Oct 30 '20 at 00:41

1 Answers1

0

If you want to merge two dataframe using a column in common you can use the merge function:

df = df1.merge(df2, on='column in common',how='left or right')

if you want to only concat two df use concat:

df = pd.concat([df1,df2], axis=1) #axis=1 means merge their columns to merge. with rows use axis=0
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28