basically I've got dataframe with loads of data about real estate. Every day, new data are added for each real estate, most importantly its price, region where the real estate is and the date, that this real estate was added to the dataframe. For every region I want to calculate development of the price every day. I get my dataframe from database like this:
data1 = pd.read_sql_query(
"SELECT REAL_ESTATE.UNIQUE_RE_NUMBER, REAL_ESTATE.TYP_ID, ADDRESS.ADDRSS, ADDRESS.LOCATION, PRICE.RE_PRICE, MAX(PRICE.UPDATE_DATE) AS UPDATE_DATE, HOUSEINFO.RE_POLOHA, HOUSEINFO.RE_DRUH, HOUSEINFO.RE_TYP, HOUSEINFO.RE_UPLOCHA "
"FROM REAL_ESTATE INNER JOIN ADDRESS, PRICE, HOUSEINFO ON REAL_ESTATE.ID=ADDRESS.RE_ID AND REAL_ESTATE.ID=PRICE.RE_ID AND REAL_ESTATE.ID=HOUSEINFO.INF_ID GROUP BY REAL_ESTATE.ID ",
conn)
data2 = pd.read_sql_query(
"SELECT REAL_ESTATE.UNIQUE_RE_NUMBER, REAL_ESTATE.TYP_ID, ADDRESS.ADDRSS, ADDRESS.LOCATION, PRICE.RE_PRICE, MAX(PRICE.UPDATE_DATE) AS UPDATE_DATE, FLATINFO.RE_DISPOZICE, FLATINFO.RE_DRUH, FLATINFO.RE_PPLOCHA "
"FROM REAL_ESTATE INNER JOIN ADDRESS, PRICE, FLATINFO ON REAL_ESTATE.ID=ADDRESS.RE_ID AND REAL_ESTATE.ID=PRICE.RE_ID AND REAL_ESTATE.ID=FLATINFO.INF_ID GROUP BY REAL_ESTATE.ID ",
conn)
data3 = pd.read_sql_query(
"SELECT REAL_ESTATE.UNIQUE_RE_NUMBER, REAL_ESTATE.TYP_ID, ADDRESS.ADDRSS, ADDRESS.LOCATION, PRICE.RE_PRICE, MAX(PRICE.UPDATE_DATE) AS UPDATE_DATE, LANDINFO.RE_PLOCHA, LANDINFO.RE_DRUH, LANDINFO.RE_SITE, LANDINFO.RE_KOMUNIKACE "
"FROM REAL_ESTATE INNER JOIN ADDRESS, PRICE, LANDINFO ON REAL_ESTATE.ID=ADDRESS.RE_ID AND REAL_ESTATE.ID=PRICE.RE_ID AND REAL_ESTATE.ID=LANDINFO.INF_ID GROUP BY REAL_ESTATE.ID ",
conn)
df = [data1, data2, data3]
dff = pd.concat(df)
dff = dff.reset_index(drop=True)
For calculating average I have this command:
dff['LOC_DATE_AVG'] = dff.groupby(['LOCATION', 'UPDATE_DATE'])['RE_PRICE'].transform('mean')
This only shows, what average price was added each day, but I would like to calculate the whole average of every real estate added to a particular date. So when I have data from 1.1.2021, 2.1.2021, 3.1.2021 and when I want to know average to the date 2.1.2021, it would calculate average from 1.1.2021 and also from 2.1.2021. Is it possible?