Looping over panda's DataFrame

Question

I'm trying to go over a DF I have but can't figure it out.

It's a script that checks an Excel file for dates of new employees.

import pandas as pd
import datetime as dt

xls = pd.ExcelFile(r'test.xlsx')
df = pd.read_excel(xls, 'New Employment')
df['Start Date'] = pd.to_datetime(df['Start Date'])
today = pd.Timestamp.today()

#Calculate how many days are left til the employee starts working
df['Starts In'] = (df['Start Date'] - today).dt.days
delta_df = df[['Name', 'Starts In']]

So at this point, delta_df has the entire list of new employees. It prints out their name and number of days until they start working.

I would like to go over this DF and put a condition to check whether there's an employee who will start working in less than 5 days. If there is one, add it to a list/DF.

That list/DF will later be attached to an email I'll send.

I'm not sure how to perform this check.

If I understand you correctly you need `df.loc[df['Starts In'].le(4), 'Name']` — Erfan, Jun 17 '19 at 15:24
Find more info [here](https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas). Which this question is also the duplicate of. — Erfan, Jun 17 '19 at 15:25

score 0 · Accepted Answer · answered Jun 17 '19 at 15:25

delta_df['starts_soon'] = delta_df['Starts In']  < 5

You don't need to loop through the dataframe, vectorization is what makes pandas so powerful.

If you just want the list of names of people that starts in less than 5 days just do something like

delta_df.loc[delta_df['Starts In'] < 5, 'Name']

For your email you can even do

delta_df.loc[delta_df['Starts In'] < 5, 'Name'].to_csv('name_list.csv')

score 0 · Answer 2 · answered Jun 17 '19 at 15:26

Just filter your delta_df to get the rows with 'Starts In' <= 5

lessthan5 = delta_df[delta_df['Starts In'] <= 5]

Then you can check if this dataframe is not empty

if len(lessthan5) > 0:
    # DO WHAT YOU WANT WITH THOSE EMPLOYEES

You can get the employees as a list with :

lessthan5.Name.tolist()

score 0 · Answer 3 · answered Jun 17 '19 at 15:29

0

Create filter:

filter_starts_soon = delta_df['Starts In'] < 5

Use filter to get names:

result = delta_df.loc[filter_starts_soon, 'Name'].to_list()

answered Jun 17 '19 at 15:29

sebvargo

613
7
10

Looping over panda's DataFrame

3 Answers3