df['date'] = "wooo"
sets the entire column to "wooo"
, which I'm sure is not your intention. You should just set that row's date
column
for index in df.index:
if df.loc[index, 'time'] == '00:00':
df.loc[index, 'date'] = "wooo"
else:
df.loc[index, 'date'] = "blah"
Better yet, use logical indexing and set multiple rows at once instead of iterating over rows and setting each value individually:
df['date'] = 'blah' # Set everything to blah
zero_rows = df['time'] == '00:00' # Find the rows where time is 00:00
df.loc[zero_rows, 'date'] = 'wooo' # Set these rows to wooo
For example, with
df = pd.DataFrame([['00:00', 2, 3], ['00:01', 5, 6], ['00:02', 8, 9], ['00:00', 10, 11]], columns=['time', 'b', 'c'])
we get:
time b c date
0 00:00 2 3 wooo
1 00:01 5 6 blah
2 00:02 8 9 blah
3 00:00 10 11 wooo
Re. your comment:
The issue with the logical indexing is that eventually I’m trying to assign a date which counts forwards at the start of each new day (time = 00:00), so until a 00:00 is found insert todays date in each row. Once found, insert today+1 and continue until found again and then insert today+2 etc. The number of rows until the first 00:00 is found varies depending on when the dataframe is created.
This is easily implemented. Let's start with a longer dataframe:
df = pd.DataFrame([['00:00', 2, 3], ['12:00', 5, 6], ['00:00', 8, 9], ['12:00', 10, 11], ['00:00', 18, 19], ['12:00', 110, 111], ['00:00', 28, 29], ['12:00', 210, 211]], columns=['time', 'b', 'c'])
Let's pull out all the columns with time == '00:00'
.
zero_rows = df['time'] == '00:00'
midnights = df.loc[zero_rows, :]
time b c
0 00:00 2 3
2 00:00 8 9
4 00:00 18 19
6 00:00 28 29
Now, create a new column 'increment'
and fill it with nan
. Then, fill only the midnight values with numbers:
if df.iloc[0]['time'] == '00:00':
start_increment = 0 # First row is midnight, so we start with an increment of zero
else:
start_increment = 1 # First row is not midnight, so the first midnight row has an increment of 1
df['increment'] = pd.NA
df.loc[midnights.index, 'increment'] = range(start_increment, len(midnights)+start_increment)
time b c increment
0 00:00 2 3 0
1 12:00 5 6 <NA>
2 00:00 8 9 1
3 12:00 10 11 <NA>
4 00:00 18 19 2
5 12:00 110 111 <NA>
6 00:00 28 29 3
7 12:00 210 211 <NA>
Next, we can use ffill()
to fill values in the increment column that are NA
.
df['increment'].ffill(inplace=True)
time b c increment
0 00:00 2 3 0
1 12:00 5 6 0
2 00:00 8 9 1
3 12:00 10 11 1
4 00:00 18 19 2
5 12:00 110 111 2
6 00:00 28 29 3
7 12:00 210 211 3
Any NA
s that still remain were before the first midnight, so they should be zeros:
df['increment'].fillna(0, inplace=True)
Now, we want to add a column 'date'
which has a value of today
+ increment
days. To do this, we first convert the increment
column to pd.TimeDelta
:
df['increment'] = pd.to_timedelta(df['increment'], unit='D')
time b c increment
0 00:00 2 3 0 days
1 12:00 5 6 0 days
2 00:00 8 9 1 days
3 12:00 10 11 1 days
4 00:00 18 19 2 days
5 12:00 110 111 2 days
6 00:00 28 29 3 days
7 12:00 210 211 3 days
Finally, add today's date to the increment column:
import datetime
df['date'] = datetime.datetime.today() + df['increment']
time b c increment date
0 00:00 2 3 0 days 2022-02-15
1 12:00 5 6 0 days 2022-02-15
2 00:00 8 9 1 days 2022-02-16
3 12:00 10 11 1 days 2022-02-16
4 00:00 18 19 2 days 2022-02-17
5 12:00 110 111 2 days 2022-02-17
6 00:00 28 29 3 days 2022-02-18
7 12:00 210 211 3 days 2022-02-18