So I'm working on a project that uses the live COVID-19 data from JHU CSSE. I currently have three data frames, with each data frame representing the time series of confirmed cases, recovered cases, and confirmed deaths. Each data frame has rows that contain the following information: Province/State, Country/Region, and a column for each day since the beginning of the data recording, where the value is the running total for that dataset, be it confirmed, recovered, or deaths. Here is a sample of the dataset:
Province/State Country/Region Lat Long 1/22/20 ... 10/28/20
NaN Afghanistan 33.93911 67.709953 0 41145
NaN Albania 41.15330 20.168300 0 20040
NaN Algeria 28.03390 1.659600 0 57026
NaN Andorra 42.50630 1.521800 0 4517
NaN Angola -11.20270 17.873900 0 10074
NaN Antigua and Barbuda 17.06080 -61.796400 0 124
NaN Argentina -38.41610 -63.616700 0 1130533
NaN Armenia 40.06910 45.038200 0 82651
Australian Capital Australia -35.47350 149.012400 0 114
New South Wales Australia -33.86880 151.209300 0 4411
Each of the three data frames have the same number of rows and columns, in addition to the first two columns being identical.
Currently, if I wanted to get the number of confirmed, recovered, and deaths for a specific country on a specific date, I would need to go to each data frame and lookup that country and day.
In order to consolidate the tables and save on memory and performance, I'd like to merge the tables in such a way that the original values are kept. Ideally, rather than looking up the same cell three different times in three different data frames, I'd have one cell in one data frame that would give me all three values. So, for example, if I look up the USA on X day, I would get a list of integers such as [cases, deaths, recovered]. Is there any way to combine the tables in Pandas?
One way this could look is pulling a country is the following:
State,Country,1/22/20,1/23/20m,etc.
New York, USA, [confirmed, deaths, recovered], [confirmed, deaths, recovered], etc...