1

Given a Pandas dataframe such as:

Name   Age
John   20
Mary   65
Bob    55

I wish to iterate over the rows, decide whether each person is a senior (age>=60) or not, create a new entry with an extra column, then append that to a csv file such that it (the csv file) reads as follows:

Name   Age  Senior
John   20   False
Mary   65   True
Bob    55   False

Other than saving the data to a csv, I am able to do the rest by turning the series the loop is currently iterating over to a dictionary then adding a new key.

for idx, e in records.iterrows():

        entry = e.to_dict()
        entry["senior"] = (entry["age"]<60)

Simply converting dict to series to dataframe isnt writing it to the csv file properly. Is there a pandas or non-pandas way of making this work?

IMPORTANT EDIT : The above is a simplified example, I am dealing with hundreds of rows and the data I want to add is a long string that will be created during run time, so looping is mandatory. Also, adding that to the original dataframe isnt an option as I am pretty sure Ill run out of program memory at some point (so I cant add the data to the original dataframe nor create a new dataframe with all the information). I dont want to add the data to the original dataframe, only to a copy of a "row" that will then be appended to a csv.

The example is given to provide some context for my question, but the main focus should be on the question, not the example.

Mohamad Moustafa
  • 479
  • 5
  • 19

4 Answers4

2

Loops here are not necessary, only assign new column by compare with scalar and for avoid create columns in original DataFrame use DataFrame.assign - it return new DataFrame with new column and original is not changed:

df1 = df.assign(senior = df["age"]>=60)

EDIT:

If really need loops (not recommended):

for idx, e in df.iterrows():
    df.loc[idx, "senior"] = e["Age"]>=60

print (df)
   Name  Age  senior
0  John   20   False
1  Mary   65    True
2   Bob   55   False
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This will change the original dataframe, I dont want that. – Mohamad Moustafa Jun 24 '19 at 10:33
  • My edit also mentioned that the actual data Ill be adding is a (large) bunch of strings, which is why I dont wish to create an entire dataframe all at once but deal with it row by row. Also, the strings I store are created during runtime, so I need to loop over the rows. All I want to know is how to store the dictionary I already have in my code (entry) to a csv so that it looks like what I have in my code. – Mohamad Moustafa Jun 24 '19 at 10:50
  • @MohamadMoustafa - I dont understand. Do you need create new column or columns to existing `csv`? – jezrael Jun 24 '19 at 10:55
  • From a csv I get a dataframe, I loop over that dataframe and (according to the info in each row) a string is generated that I want to add under a new column. But since the string might be large, I dont want to add it to the original dataframe. I turn the series (the row being looped over) to a dictionary to easily add the new data (by adding a new key). What I want now is to get that dict, and somehow append it to another csv file so that it looks the same as the first csv but with an extra column (or to know if theere is a better way to do what I am trying to do without using dictionaries). – Mohamad Moustafa Jun 24 '19 at 10:59
1

use np.where

import numpy as np
df1 = df.copy()
df1['Senior'] = np.where(df1['Age']>60,True,False)
tawab_shakeel
  • 3,701
  • 10
  • 26
1

Also you can use ge:

df2 = df.copy()
df2['senior'] = df2['Age'].ge(60)

And now:

print(df2)

Output:

   Name  Age senior
0  John   20  False
1  Mary   65   True
2   Bob   55  False
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
0

Found the answer I needed here: Convert a dictionary to a pandas dataframe

Code:

first_entry=True
for idx, e in records.iterrows():

        entry = e.to_dict()
        entry["senior"] = (entry["age"]<60)
        df_entry = pd.DataFrame([entry], columns=entry.keys())

        df_entry.to_csv(output_path, sep=',', index=False, columns=header,header=first_entry,mode='a') 
        #output_path is a variable with path to csv, header is a variable with list of new column names
        first_entry=False

Was hoping for a better way to do it, but this one works fine.

Mohamad Moustafa
  • 479
  • 5
  • 19