1

I want to create a column and assign it to a dataframe after every iteration of the for loop in python.

df_xyz = pd.DataFrame()
for j in range(0,3):
    for k in range(j+1,4):
        print(j,k)

So, in this case it should create 6 new columns in the dataframe with the name as "ABC1","ABC2"..."ABC6". And the columns will get the values from a numpy array which is generated by running the code present in the loop. My actual code involves some algorithm but here I am just placing the relevant code on which I need help.

Edit 1:

Updated the code:

z= np.array([1,2,4])
df_xyz = pd.DataFrame()
for j in range(0,3):
    for k in range(j+1,4):
        print(j,k)
        df_xyz = pd.DataFrame(z)  

This creates a new column only once.

Akash
  • 359
  • 1
  • 7
  • 27
  • Many sources on line, including on SO (Stack Overflow) show you how to add a new column to a data frame, using existing data. The column name is just keeping a counter and concatenating its string equivalent to `ABC`. Where are you stuck with this? – Prune Aug 21 '19 at 19:09
  • Possible duplicate of [Adding new column to existing DataFrame in Python pandas](https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas) – Tom Wojcik Aug 21 '19 at 19:10
  • 1
    I did check the old posts, but in my case it is just showing me the last column. Meaning it is overriding the old columns. – Akash Aug 21 '19 at 19:11
  • You need to update your explanations. The code you provided has an empty DafaFrame and prints numbers for 2 loops that are never related to a dataframe and adding a column – Benoit Drogou Aug 21 '19 at 19:15
  • 1
    @TomWojcik Hi Tom, Can you please tell me where exactly my scenario is covered in the given link ? I want to create columns within for loop. I know how to create a new column but not sure how to keep adding the columns in the dataframe. – Akash Aug 21 '19 at 19:15
  • please: 1- Give a sample of Dataframe 2- explain really what you want to do. You do not want to add a column thanks to a loop. That is a way to do what you want to do, probably not the only one – Benoit Drogou Aug 21 '19 at 19:17
  • @Akash If you can create a new column, then that's exactly how you add one in the for loop. – Tom Wojcik Aug 21 '19 at 19:17

2 Answers2

3

It really looks awful, but I think you are trying to do this:

In [1]:
import pandas as pd 
import numpy as np

z= np.array([1,2,4])
df_xyz = pd.DataFrame()
iterator = 1
for j in range(0,3):
    for k in range(j+1,4):
        print(j,k)
        col_name = 'ABC' + str(iterator)
        df_xyz.loc[:, col_name] = z  
        iterator += 1 
df

Out [1]:
    ABC1    ABC2    ABC3    ABC4    ABC5    ABC6
0   1       1       1       1       1       1
1   2       2       2       2       2       2
2   4       4       4       4       4       4

You can as well do something like this :

In [2]:
import pandas as pd 
import numpy as np

my_cols = ['ABC1', 'ABC2', 'ABC3', 'ABC4', 'ABC5', 'ABC6']

z= np.array([1,2,4])
df_xyz = pd.DataFrame()
for j in range(0,3):
    for k in range(j+1,4):
        print(j,k)
        col_name = my_cols[0]
        my_cols.pop(0)
        df_xyz.loc[:, col_name] = z  
df

Out [2]:
    ABC1    ABC2    ABC3    ABC4    ABC5    ABC6
0   1       1       1       1       1       1
1   2       2       2       2       2       2
2   4       4       4       4       4       4
Benoit Drogou
  • 969
  • 1
  • 5
  • 15
  • The above code creates only 3 columns. However I want this to create 6 columns because j value also changes from 0 to 3 – Akash Aug 21 '19 at 19:38
  • Just a little edit then. Add both j an k values in columns name – Benoit Drogou Aug 21 '19 at 19:48
  • It only created 3 columns because there was only 3 distinct column names. When the column already exist, it updates the column and do not create another one. You cannot have 2 columns with the same name – Benoit Drogou Aug 21 '19 at 20:23
1
# importing pandas
import pandas as pd

# Creating new dataframe
initial_data = {'First_name': ['Ram', 'Mohan', 'Tina', 'Jeetu', 'Meera'], 
            'Last_name': ['Kumar', 'Sharma', 'Ali', 'Gandhi', 'Kumari'], 
            'Marks': [12, 52, 36, 85, 23] }

df = pd.DataFrame(initial_data, columns = ['First_name', 'Last_name', 
'Marks'])

# Generate result using pandas
result = []
for value in df["Marks"]:
if value >= 33:
    result.append("Pass")
elif value < 0 and value > 100:
    result.append("Invalid")
else:
    result.append("Fail")
   
df["Result"] = result   
print(df)
nick
  • 346
  • 2
  • 4