I am trying to build a dataframe where the data is grabbed from multiple files. I have created an empty dataframe with the desired shape, but I am having trouble grabbing the data. I found this but when I concat, I am still getting NaN values.
Edit2: I changed the order of df
creation and put concat inside the for loop and same result. (for obvious reasons)
import pandas as pd
import os
import glob
def daily_country_framer():
# create assignments
country_source = r"C:\Users\USER\PycharmProjects\Corona Stats\Country Series"
list_of_files = glob.glob(country_source + r"\*.csv")
latest_file = max(list_of_files, key=os.path.getctime)
last_frame = pd.read_csv(latest_file)
date_list = []
label_list = []
# build date_list values
for file in os.listdir(country_source):
file = file.replace('.csv', '')
date_list.append(file)
# build country_list values
for country in last_frame['Country']:
label_list.append(country)
# create dataframe for each file in folder
for filename in os.listdir(country_source):
filepath = os.path.join(country_source, filename)
if not os.path.isfile(filepath):
continue
df1 = pd.read_csv(filepath)
df = pd.DataFrame(index=label_list, columns=date_list)
df1 = pd.concat([df])
print(df1)
daily_country_framer()
Two sample dataframes: (notice the different shapes)
Country Confirmed Deaths Recovered
0 World 1595350 95455 353975
1 Afghanistan 484 15 32
2 Albania 409 23 165
3 Algeria 1666 235 347
4 Andorra 583 25 58
.. ... ... ... ...
180 Vietnam 255 0 128
181 West Bank and Gaza 263 1 44
182 Western Sahara 4 0 0
183 Zambia 39 1 24
184 Zimbabwe 11 3 0
[185 rows x 4 columns]
Country Confirmed Deaths Recovered
0 World 1691719 102525 376096
1 Afghanistan 521 15 32
2 Albania 416 23 182
3 Algeria 1761 256 405
4 Andorra 601 26 71
.. ... ... ... ...
181 West Bank and Gaza 267 2 45
182 Western Sahara 4 0 0
183 Yemen 1 0 0
184 Zambia 40 2 25
185 Zimbabwe 13 3 0
[186 rows x 4 columns]
Current output:
01-22-2020 01-23-2020 ... 04-09-2020 04-10-2020
World NaN NaN ... NaN NaN
Afghanistan NaN NaN ... NaN NaN
Albania NaN NaN ... NaN NaN
Algeria NaN NaN ... NaN NaN
Andorra NaN NaN ... NaN NaN
... ... ... ... ... ...
West Bank and Gaza NaN NaN ... NaN NaN
Western Sahara NaN NaN ... NaN NaN
Yemen NaN NaN ... NaN NaN
Zambia NaN NaN ... NaN NaN
Zimbabwe NaN NaN ... NaN NaN
[186 rows x 80 columns]
Desired output: (where NaN equals corresponding values from target column or a list of all columns ie: if ['Confirmed'] then 0,1,2,3,4, if all then [0,0,0],[1,0,0],[2,0,0])