How to loop over a dataframe, add new fields to a series, then append that series to a csv?

Question

Given a Pandas dataframe such as:

Name   Age
John   20
Mary   65
Bob    55

I wish to iterate over the rows, decide whether each person is a senior (age>=60) or not, create a new entry with an extra column, then append that to a csv file such that it (the csv file) reads as follows:

Name   Age  Senior
John   20   False
Mary   65   True
Bob    55   False

Other than saving the data to a csv, I am able to do the rest by turning the series the loop is currently iterating over to a dictionary then adding a new key.

for idx, e in records.iterrows():

        entry = e.to_dict()
        entry["senior"] = (entry["age"]<60)

Simply converting dict to series to dataframe isnt writing it to the csv file properly. Is there a pandas or non-pandas way of making this work?

IMPORTANT EDIT : The above is a simplified example, I am dealing with hundreds of rows and the data I want to add is a long string that will be created during run time, so looping is mandatory. Also, adding that to the original dataframe isnt an option as I am pretty sure Ill run out of program memory at some point (so I cant add the data to the original dataframe nor create a new dataframe with all the information). I dont want to add the data to the original dataframe, only to a copy of a "row" that will then be appended to a csv.

The example is given to provide some context for my question, but the main focus should be on the question, not the example.

Looks easy. In each iteration, build a string with the line you want to write to the file, then write that string to the file. — Stop harming Monica, Jun 24 '19 at 11:26

jezrael · Answer 1 · 2019-06-24T10:39:24.483

2

Loops here are not necessary, only assign new column by compare with scalar and for avoid create columns in original DataFrame use DataFrame.assign - it return new DataFrame with new column and original is not changed:

df1 = df.assign(senior = df["age"]>=60)

EDIT:

If really need loops (not recommended):

for idx, e in df.iterrows():
    df.loc[idx, "senior"] = e["Age"]>=60

print (df)
   Name  Age  senior
0  John   20   False
1  Mary   65    True
2   Bob   55   False

edited Jun 24 '19 at 10:39

answered Jun 24 '19 at 10:32

jezrael

822,522
95
1,334
1,252

This will change the original dataframe, I dont want that. – Mohamad Moustafa Jun 24 '19 at 10:33
My edit also mentioned that the actual data Ill be adding is a (large) bunch of strings, which is why I dont wish to create an entire dataframe all at once but deal with it row by row. Also, the strings I store are created during runtime, so I need to loop over the rows. All I want to know is how to store the dictionary I already have in my code (entry) to a csv so that it looks like what I have in my code. – Mohamad Moustafa Jun 24 '19 at 10:50
@MohamadMoustafa - I dont understand. Do you need create new column or columns to existing `csv`? – jezrael Jun 24 '19 at 10:55
From a csv I get a dataframe, I loop over that dataframe and (according to the info in each row) a string is generated that I want to add under a new column. But since the string might be large, I dont want to add it to the original dataframe. I turn the series (the row being looped over) to a dictionary to easily add the new data (by adding a new key). What I want now is to get that dict, and somehow append it to another csv file so that it looks the same as the first csv but with an extra column (or to know if theere is a better way to do what I am trying to do without using dictionaries). – Mohamad Moustafa Jun 24 '19 at 10:59

tawab_shakeel · Answer 2 · 2019-06-24T10:37:14.507

1

use np.where

import numpy as np
df1 = df.copy()
df1['Senior'] = np.where(df1['Age']>60,True,False)

edited Jun 24 '19 at 10:37

answered Jun 24 '19 at 10:31

tawab_shakeel

3,701
10
26

@MohamadMoustafa you can simply add any string against True and False – tawab_shakeel Jun 24 '19 at 10:42
@MohamadMoustafa as we are copying the the data to other dataframe original dataframe would not be effected – tawab_shakeel Jun 24 '19 at 10:43
df1.to_csv("file_name.csv") then del df1 – tawab_shakeel Jun 24 '19 at 10:43

score 1 · Answer 3 · answered Jun 24 '19 at 10:33

1

Also you can use ge:

df2 = df.copy()
df2['senior'] = df2['Age'].ge(60)

And now:

print(df2)

Output:

   Name  Age senior
0  John   20  False
1  Mary   65   True
2   Bob   55  False

answered Jun 24 '19 at 10:33

U13-Forward

69,221
14
89
114

score 0 · Accepted Answer · answered Jun 24 '19 at 11:26

Found the answer I needed here: Convert a dictionary to a pandas dataframe

Code:

first_entry=True
for idx, e in records.iterrows():

        entry = e.to_dict()
        entry["senior"] = (entry["age"]<60)
        df_entry = pd.DataFrame([entry], columns=entry.keys())

        df_entry.to_csv(output_path, sep=',', index=False, columns=header,header=first_entry,mode='a') 
        #output_path is a variable with path to csv, header is a variable with list of new column names
        first_entry=False

Was hoping for a better way to do it, but this one works fine.

How to loop over a dataframe, add new fields to a series, then append that series to a csv?

4 Answers4