this is my first question, so bear with me if i am doing something wrong :) I am working with a data set for 2018, 2019, 2020 and 2021. (The data set is complete with a file for each day, i have checked the files manually) I am trying to sum some values with pandas by Groupby and datetime. The end goal is to have a sum value for each day for the whole data set. For some reason, the year 2020 is skipping the 4th in every month, and i cant figure out why. i have seartch the stackoverflow, datacamp and others for many hours now without a solution. Now i have to try ask you guys.
My code starts by making a list with all the filenames i my folder. After that i replace "." with ":" so i can convert the dates to datetime64[ns] (there are different date formats in the files) the code append the dates to a list and set them in the column. returns the files by concat in a single dataframe. after this i am making a new column where i look for some words in another column. this new column is used for sorting the dataframe. i use .loc for this. then i groupby.dt.date and .sum then if a date has a value of zero i need it to show so i use .asfreq("D") and .fillna(0)
its all good, but every 4th (4/1/2020, 4/2/2020, 4/3/2020, 4/4/2020, 4/5/2020 and so on) is missing. But WHYYY! :) (The file are there. if i take all the files for them self, its all good. The values are there.)
def FileList(): #Creates a list with all the files
CSVFileList = []
FilePath = 'Data/'
FileDirectoryList = os.listdir(FilePath)
for i in FileDirectoryList:
if(i[-3:] == "CSV"):
CSVFileList.append(i)
return CSVFileList
def Concat(filelist): #Concat all the files in the list from FileList
columns_names = ["Order_ID", "Dato", "Varenummer", "Varenavn", "Quantity", "Pris", "Pris_Sum", "Tilbehør", "col1", "col2"]
dflist = []
for i in filelist:
df = pd.read_csv("Data/"+i, delimiter = ";", names = columns_names)
DatoList = []
for i in df["Dato"]:
u = i.replace(".",":")
DatoList.append(u)
df["Dato"] = DatoList
df['Dato'] = pd.to_datetime(df['Dato']) #Convert date's to Datetime64[ns]
dflist.append(df)
dataframe = pd.concat(dflist)
return dataframe
def BestillingDF(filename):
df = filename
#New Vare_Beskrivelse
Vare_navn = []
Varenavn = df["Varenavn"]
for i in Varenavn:
i = str(i)
i = i.lower()
if "spareribs" in i:
Vare_navn.append("Spareribs")
elif "salat" in i:
Vare_navn.append("Salat")
elif "pommes" in i:
Vare_navn.append("Pommes Frites")
else:
Vare_navn.append("")
df["Vare_Navn"] = Vare_navn
Spareribs_Vægt_List = []
spareribsvægt = bi.Spareribs_Vægt() #Gets a value from a dict in another file
for i in df["Varenavn"]:
if i not in spareribsvægt:
Spareribs_Vægt_List.append(0)
else:
Spareribs_Vægt_List.append(spareribsvægt[i])
df["Spareribs_Vægt_Kuvert"] = Spareribs_Vægt_List
df["Spareribs_Vægt_[Q*K]"] = df["Quantity"] * df["Spareribs_Vægt_Kuvert"]
return df
def Spareribs_A_Day(filename):
data = BestillingDF(filename)
spare = data.loc[data["Vare_Navn"] == "Spareribs"] #Sorts the dataframe
s = spare.groupby([spare['Dato'].dt.date])['Spareribs_Vægt_Kuvert'].sum() #Groups the series by date/day and sums the value
t = s.asfreq('D') #Shows the zero values (NaN)
df = pd.DataFrame(t)
df['Spareribs_Vægt_Kuvert'] = df['Spareribs_Vægt_Kuvert'].fillna(0) #Fills the NAN by zero
return df
def main():
filelist = FileList()
concat_DF = Concat(filelist)
ribs = Spareribs_A_Day(concat_DF)
print(ribs.head())
main()
I am no champion in python or coding. but i hope that you guys maybe can help me anyhow :) the code result