Fill Panda DF with new data

Question

This is my first post on StackOverflow. My apologies if I do something wrong or against the web's rules. My problem: I have a csv file I read using pandas in python. The dataframe has five columns, named [yday, wday, time, stop, N]:

yday is the year's day, from 1 to 365;
wday is the week's day, from 1 to 7;
time is a number from 1-144 (I divide the day in gaps of 10 minutes each, 1440 minutes per day/10 minutes = 144);
stop is a the number of the bus stop (1-4);
N is the number of ppl who enter the bus

Well, I would like to have an entry for each gap, giving 144 rows per day, but I have some missing gaps as you can see: CSV example

My goal is to add new rows to fill all the time gaps, like adding (based on the image given):

320,6,81,1,1 <-- Exists

320,6,82,1,na <-- New 

320,6,83,1,na <-- New

320,6,84,1,na <-- New

320,6,85,1,1 <-- Exists

I tried to index my DataFrame with df.set_index['stop','yday','time'] so I could reindex it with 'time' values from 1 to 144, but it doesn't work. I'm new to Python and i'm getting mad trying to solve this.

Thanks in advance and sorry for my english.

What issue are you having with your reindex? You just need to expand your index with the missing `time` values, afterwards using `df.reindex` should work: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html — r.ook, May 01 '20 at 15:40
Does this answer your question? [Reindexing a specific level of a MultiIndex dataframe](https://stackoverflow.com/questions/48179297/reindexing-a-specific-level-of-a-multiindex-dataframe) — r.ook, May 01 '20 at 16:29

score 0 · Accepted Answer · answered May 01 '20 at 16:39

Apologies for the long code. But it solves your purpose:

#Specify the input csv file path here. I am assuming your csv has only 
#the columns and column names you mentioned in your question. If not you have to 
#modify the below code to reflect your changed columns
df = pd.read_csv("Path_to_your_input_csv_file_here") 

df = df.sort_values(['yday', 'wday', 'time'], ascending=True) #Sort df values based upon yday, wday and time first
df = df.reset_index(drop=True) #Reset the indices after sorting

df2 = df.copy(deep=True) #Make a deep copy of this sorted dataframe

#The below for loop iterates through rows of 'df', finds differences between time values and adds up missing rows to 'df2'
for index, row in df.iterrows(): #Iterate through the rows of df
    if index == len(df)-1:
        break
    else:
        if row["yday"] == df.loc[index+1,"yday"] and row["wday"] == df.loc[index+1,"wday"] and row["time"] < df.loc[index+1,"time"]:
            differences = list(range(row["time"]+1,df.loc[index+1,"time"]))
            for item in differences:
                tempdf = pd.DataFrame([[row["yday"], row["wday"],item, row['stop'], 'NA' ]],columns = df2.columns)
                df2 = df2.append(tempdf)

#Now sort 'df2' based upon yday,wday and time
df2 = df2.sort_values(['yday', 'wday', 'time'], ascending=True)
df2 = df2.reset_index(drop=True) #Reset indices

print(df2)

Output:

    yday  wday  time  stop   N
0    320     6    81     1   1
1    320     6    82     1  NA
2    320     6    83     1  NA
3    320     6    84     1  NA
4    320     6    85     1   1
5    320     6    86     1  NA
6    320     6    87     1  NA
7    320     6    88     1  NA
8    320     6    89     1   1
9    320     6    90     1  NA
10   320     6    91     1  NA
11   320     6    92     1  NA
12   320     6    93     1   1
13   320     6    94     1  NA
14   320     6    95     1  NA
15   320     6    96     1  NA
16   320     6    97     1   1

Cheers!

Dude you can't imagine how much i love your right now, it works!! tytytytyty <3 — Asier, May 01 '20 at 17:13

Fill Panda DF with new data

1 Answers1