Creating a loop program to apply values to pandas df

Question

This question maybe super basic and apologize for that..

But I am trying to create a for loop that would enter a value of 1 or 0 into a pandas dataframe based on a condition.

import pandas as pd

def checkHour6(time):
    val = 0
    if  time == 6:
        val = 1 
    return val

def checkHour7(time):
    val = 0
    if  time == 7:
        val = 1 
    return val

def checkHour8(time):
    val = 0
    if  time == 8:
        val = 1 
    return val

def checkHour9(time):
    val = 0
    if  time == 9:
        val = 1 
    return val

def checkHour10(time):
    val = 0
    if  time == 10:
        val = 1 
    return val

This for loop that I am attempting will count from 0 to 23, and I am attempting to building pandas dataframe in the loop process that will enter a value of a 1 or 0 appropriately but I am missing something basic as the final df result is an empty dataframe.

Create empty df:

df = pd.DataFrame({'hour_6':[], 'hour_7':[], 'hour_8':[], 'hour_9':[], 'hour_10':[]})

For Loop:

hour = -1

for i in range(24):
    stuff = []
    hour = hour + 1
    stuff.append(checkHour6(hour))
    stuff.append(checkHour7(hour))
    stuff.append(checkHour8(hour))
    stuff.append(checkHour9(hour))
    stuff.append(checkHour10(hour))
    df.append(stuff)

try don't use loops with pandas, pandas has methods to do it — ansev, Mar 13 '20 at 20:37
I am attempting to create a dataframe to be used with a machine learning process. But maybe boolean values would work as well?? — bbartling, Mar 16 '20 at 15:13

Jaroslav Bezděk · Accepted Answer · 2020-03-13T19:50:04.647

I would suggest the following:

use only one checkHour() function with a parameter for hour,
according to pandas.DataFrame.append() documentation, other parameter has to be DataFrame or Series/dict-like object, or list of these, so list cannot be used,
if you want to make a data frame by appending new rows to the existing one, you have to assign it.

The code can look like this:

def checkHour(time, hour):
    val = 0
    if time == hour:
        val = 1 
    return val

df = pd.DataFrame({'hour_6':[], 'hour_7':[], 'hour_8':[], 'hour_9':[], 'hour_10':[]})

hour = -1

for i in range(24):
    stuff = {}
    hour = hour + 1
    stuff['hour_6'] = checkHour(hour, 6)
    stuff['hour_7'] = checkHour(hour, 7)
    stuff['hour_8'] = checkHour(hour, 8)
    stuff['hour_9'] = checkHour(hour, 9)
    stuff['hour_10'] = checkHour(hour, 10)
    df = df.append(stuff, ignore_index=True)

The result is following:

>>> print(df)
    hour_6  hour_7  hour_8  hour_9  hour_10
0      0.0     0.0     0.0     0.0      0.0
1      0.0     0.0     0.0     0.0      0.0
2      0.0     0.0     0.0     0.0      0.0
3      0.0     0.0     0.0     0.0      0.0
4      0.0     0.0     0.0     0.0      0.0
5      0.0     0.0     0.0     0.0      0.0
6      1.0     0.0     0.0     0.0      0.0
7      0.0     1.0     0.0     0.0      0.0
8      0.0     0.0     1.0     0.0      0.0
9      0.0     0.0     0.0     1.0      0.0
10     0.0     0.0     0.0     0.0      1.0
11     0.0     0.0     0.0     0.0      0.0
12     0.0     0.0     0.0     0.0      0.0
13     0.0     0.0     0.0     0.0      0.0
14     0.0     0.0     0.0     0.0      0.0
15     0.0     0.0     0.0     0.0      0.0
16     0.0     0.0     0.0     0.0      0.0
17     0.0     0.0     0.0     0.0      0.0
18     0.0     0.0     0.0     0.0      0.0
19     0.0     0.0     0.0     0.0      0.0
20     0.0     0.0     0.0     0.0      0.0
21     0.0     0.0     0.0     0.0      0.0
22     0.0     0.0     0.0     0.0      0.0
23     0.0     0.0     0.0     0.0      0.0

EDIT:

As @Parfait mentioned, it is not good to use pandas.DataFrame.append() in for loop, because it leads to quadratic copying. To avoid that, you can make a list of dictionaries (future data frame rows) and after that call pd.DataFrame() to make a data frame out of it. The code looks like this:

def checkHour(time, hour):
    val = 0
    if time == hour:
        val = 1 
    return val

data = []
hour = -1

for i in range(24):
    stuff = {}
    hour = hour + 1
    stuff['hour_6'] = checkHour(hour, 6)
    stuff['hour_7'] = checkHour(hour, 7)
    stuff['hour_8'] = checkHour(hour, 8)
    stuff['hour_9'] = checkHour(hour, 9)
    stuff['hour_10'] = checkHour(hour, 10)
    data.append(stuff)

df = pd.DataFrame(data)

And the result is following:

>>> print(df)
    hour_6  hour_7  hour_8  hour_9  hour_10
0        0       0       0       0        0
1        0       0       0       0        0
2        0       0       0       0        0
3        0       0       0       0        0
4        0       0       0       0        0
5        0       0       0       0        0
6        1       0       0       0        0
7        0       1       0       0        0
8        0       0       1       0        0
9        0       0       0       1        0
10       0       0       0       0        1
11       0       0       0       0        0
12       0       0       0       0        0
13       0       0       0       0        0
14       0       0       0       0        0
15       0       0       0       0        0
16       0       0       0       0        0
17       0       0       0       0        0
18       0       0       0       0        0
19       0       0       0       0        0
20       0       0       0       0        0
21       0       0       0       0        0
22       0       0       0       0        0
23       0       0       0       0        0

[Never call `DataFrame.append` or `pd.concat` inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451) — Parfait, Mar 13 '20 at 19:39

score 1 · Answer 2 · answered Mar 13 '20 at 20:11

1

Another really simple solution, how to create your data frame is to use pandas.get_dummies() function like this:

df = pd.DataFrame({'hour': range(24)})
df = pd.get_dummies(df.hour, prefix='hour')
df = df[['hour_6', 'hour_7', 'hour_8', 'hour_9', 'hour_10']]

answered Mar 13 '20 at 20:11

Jaroslav Bezděk

6,967
6
29
46

Would that start at hour 0? I think I need hour to be 0 thru 23 – bbartling Mar 13 '20 at 20:20
@HenryHub, yes, it would. Function `range(23)` will start with 0 and ends with 23. – Jaroslav Bezděk Mar 13 '20 at 20:52

score 0 · Answer 3 · answered Mar 13 '20 at 19:23

0

Quick glance for the blankness issue I'd say:

hour = -1
stuff = []

for i in range(24):    
    hour = hour + 1
    stuff.append(checkHour6(hour))
    stuff.append(checkHour7(hour))
    stuff.append(checkHour8(hour))
    stuff.append(checkHour9(hour))
    stuff.append(checkHour10(hour))

df.append(stuff)

May be a better solution to the whole process though.

answered Mar 13 '20 at 19:23

MDR

2,610
1
8
18

Thanks for the help but that appears to create dataframe with 120 rows, I was hoping for df with 24 rows (to represent 24 hours in a day) where the columns would either be 1 or 0 depending on value of `hour` – bbartling Mar 13 '20 at 19:29

score 0 · Answer 4 · answered Mar 13 '20 at 19:36

start off with a data column (what hour is it) then all the other comparisons can be queried from that.

import pandas as pd
df = pd.DataFrame(range(24), columns= ['data'])
for time in range(6,11):
   df[f'hour_{time}'] = df['data']%24==time

df = df.astype(int)

If you want you can remove the data column later.

    data  hour_6  hour_7  hour_8  hour_9  hour_10
0      0       0       0       0       0        0
1      1       0       0       0       0        0
2      2       0       0       0       0        0
3      3       0       0       0       0        0
4      4       0       0       0       0        0
5      5       0       0       0       0        0
6      6       1       0       0       0        0
7      7       0       1       0       0        0
8      8       0       0       1       0        0
9      9       0       0       0       1        0
10    10       0       0       0       0        1
11    11       0       0       0       0        0
12    12       0       0       0       0        0
13    13       0       0       0       0        0
14    14       0       0       0       0        0
15    15       0       0       0       0        0
16    16       0       0       0       0        0
17    17       0       0       0       0        0
18    18       0       0       0       0        0
19    19       0       0       0       0        0
20    20       0       0       0       0        0
21    21       0       0       0       0        0
22    22       0       0       0       0        0
23    23       0       0       0       0        0

score 0 · Answer 5 · answered Mar 13 '20 at 20:23

Because the object model in numpy and pandas differs from general Python, consider avoiding building objects in a loop like you would with simpler iterables like list or dict.

In fact, your setup can be handled with simply DataFrame.pivot with a column of 24 sequential integers without any function or loop! In fact, you can return more hour columns (i.e., hour_0-hour_24) easily or reindex for your needed five columns:

Data

df = (pd.DataFrame({'hour': ['hour' for _ in range(24)]})
        .assign(hour = lambda x: x['hour'] + '_' + pd.Series(range(24)).astype('str'),
                num = 1)
     )

df3.head(5)
#      hour  num
# 0  hour_0    1
# 1  hour_1    1
# 2  hour_2    1
# 3  hour_3    1
# 4  hour_4    1

Pivot

pvt_df = (df.pivot(columns='hour', values='num')
            .fillna(0)
            .reindex(['hour_6', 'hour_7', 'hour_8', 'hour_9', 'hour_10'], axis='columns')
         )

pvt_df
# hour  hour_6  hour_7  hour_8  hour_9  hour_10
# 0        0.0     0.0     0.0     0.0      0.0
# 1        0.0     0.0     0.0     0.0      0.0
# 2        0.0     0.0     0.0     0.0      0.0
# 3        0.0     0.0     0.0     0.0      0.0
# 4        0.0     0.0     0.0     0.0      0.0
# 5        0.0     0.0     0.0     0.0      0.0
# 6        1.0     0.0     0.0     0.0      0.0
# 7        0.0     1.0     0.0     0.0      0.0
# 8        0.0     0.0     1.0     0.0      0.0
# 9        0.0     0.0     0.0     1.0      0.0
# 10       0.0     0.0     0.0     0.0      1.0
# 11       0.0     0.0     0.0     0.0      0.0
# 12       0.0     0.0     0.0     0.0      0.0
# 13       0.0     0.0     0.0     0.0      0.0
# 14       0.0     0.0     0.0     0.0      0.0
# 15       0.0     0.0     0.0     0.0      0.0
# 16       0.0     0.0     0.0     0.0      0.0
# 17       0.0     0.0     0.0     0.0      0.0
# 18       0.0     0.0     0.0     0.0      0.0
# 19       0.0     0.0     0.0     0.0      0.0
# 20       0.0     0.0     0.0     0.0      0.0
# 21       0.0     0.0     0.0     0.0      0.0
# 22       0.0     0.0     0.0     0.0      0.0
# 23       0.0     0.0     0.0     0.0      0.0

@Parfait_ Would you be able to help me with this SO question? https://stackoverflow.com/questions/60759277/create-pandas-get-dummies-df — bbartling, Mar 19 '20 at 14:48
Interesting you remark on pivot table solution but do not acknowledge this solution works for you! — Parfait, Mar 19 '20 at 15:02
Sorry Im still learning! Still investigating what/how pivot table works. I think I can relate when using Microsoft Excel — bbartling, Mar 19 '20 at 15:04

Creating a loop program to apply values to pandas df

5 Answers5