How to create a new column by iterating over the values of an existing column?

Question

I have a column in my dataframe looking like this:

I am trying to create a new column based on this ContextID column looking like this:

    ContextID     Name
    7289972   Gas_basics
    7289972   Gas_basics
    7289972   Gas_basics
    7289973   Plasma_basics
    7289973   Plasma_basics
    7304693   DC2
    7304693   DC2
    7304693   DC2

I tried the following code

ID = data.ContextID.tolist()
print(ID)

for id in ID:
    if (ID == '7289972'):
        data['Strategyname'] = 'Plasma_basics'
    elif(ID == '7289973'):
        data['Strategyname'] = 'Gas_basics'
    elif(ID == '7304693'):
        data['Strategyname'] = 'DC2'

But it is just creating a variable named id of type int and size 1 with value as 7304693.

In the end, I would like to have this newly generated column named as Strategyname added to the main dataframe named data.

Can anyone tell me what is the mistake that I am doing so that I can have a better understanding of my mistake and can help me overcome this problem?

Thanks

Edit 1:

I have 2095 unique ContextID each of them belonging to one of the 3: Gas_basics, Plasma_basics, DC2

Example:

contextid   strategyname
7289971         DC2
7289972     Plasma_basics
7289973      Gas_basics
7289997         DC2
7289998     Plasma_basics
7289999      Gas_basics
7289972     Plasma_basics
7289973      Gas_basics
7304693         DC2

Where is the mapping of what ContextID belongs to which Name? Is that in a table, dictionary, some sort of rule? — Scott Boston, Mar 05 '19 at 14:59

score 2 · Answer 1 · answered Mar 05 '19 at 14:44

If you have a DataFrame with the dictionary:

s = """
ContextID,Name
7289972,Gas_basics
7289973,Plasma_basics
7304693,DC_Only
"""
df = pd.read_csv(pd.compat.StringIO(s), sep=',')
df

you can convert it to python dict and then apply to your dataset using pandas map function (more info on map here: https://pandas.pydata.org/pandas-docs/version/0.23.3/generated/pandas.Series.map.html):

d = dict(df.values)
df['NewName'] = df['ContextID'].map(d)
df

Output:

    ContextID   Name    NewName
0   7289972 Gas_basics  Gas_basics
1   7289972 Gas_basics  Gas_basics
2   7289972 Gas_basics  Gas_basics
3   7289973 Plasma_basics   Plasma_basics
4   7289973 Plasma_basics   Plasma_basics
5   7304693 DC_Only DC_Only
6   7304693 DC_Only DC_Only
7   7304693 DC_Only DC_Only

score 1 · Accepted Answer · answered Mar 05 '19 at 14:28

1

Use np.select for a vectorized approach:

df['Name']=np.select([df.ContextID.eq(7289972),df.ContextID.eq(7289973),\
                  df.ContextID.eq(7304693)],['Plasma_basics','Gas_basics','DC_Only'])
print(df)

   ContextID           Name
0    7289972  Plasma_basics
1    7289972  Plasma_basics
2    7289972  Plasma_basics
3    7289973     Gas_basics
4    7289973     Gas_basics
5    7304693        DC_Only
6    7304693        DC_Only
7    7304693        DC_Only

answered Mar 05 '19 at 14:28

anky

74,114
11
41
70

Hey anky, thanks for the reply. I have like 2095 different `ContextID` in my dataset. Is there a way a loop can be applied so that I don't have to manually input all the `ContextID` ? Thanks – some_programmer Mar 05 '19 at 14:30
so do you have a mapping (the values to enter when condition is met) for those 2095 Ids? like a seperate dataframe of a dictionary? – anky Mar 05 '19 at 14:31
I have a separate CSV file as of now that contains the `ContextID` along with the `Name` – some_programmer Mar 05 '19 at 14:33
1

why not just use merge then? https://stackoverflow.com/questions/53645882/pandas-merging-101 , anyways, if you can post the csv wrt the example i will show you how to do it – anky Mar 05 '19 at 14:34
Hey anky, I have edited the question. Kindly let me know if that's what you wanted. Thanks – some_programmer Mar 05 '19 at 14:53
@KashyapMaheshwari exactly, i think you will find an answer in the other answers. both map and merge will work here. :) you can accept any one of them. If you run through any issues, feel free to let me know. Cheers..!! – anky Mar 05 '19 at 14:54
Hey anky, it worked. I joined both the tables. Thanks :) – some_programmer Mar 05 '19 at 15:44

score 0 · Answer 3 · answered Mar 05 '19 at 14:37

Checkout df.apply()

You can do something like the following:

def myfunc(x):
    if x==1:
        return 'hello'
    else:
        return 'world'
df = pd.DataFrame([1, 1, 1, 2, 2, 2], columns=['A'])
df['B'] = df['A'].apply(myfunc)

   A      B
0  1  hello
1  1  hello
2  1  hello
3  2  world
4  2  world
5  2  world

score 0 · Answer 4 · answered Mar 05 '19 at 14:47

you could use DataFrame.merge:

data="""
ContextID
7289972
7289972
7289972
7289973
7289973
7304693
7304693
7304693
"""
df1 = pd.read_csv(pd.compat.StringIO(data), sep='\s+',dtype={'ContextID' : str})

df2 = pd.DataFrame({'ContextID': ['7289972','7289973','7304693'],
                    'Name': ['Plasma_basics','Gas_basics','DC2']})
print(df1.merge(df2.drop_duplicates(), how='right',  on=['ContextID']))

output:

  ContextID           Name
0   7289972  Plasma_basics
1   7289972  Plasma_basics
2   7289972  Plasma_basics
3   7289973     Gas_basics
4   7289973     Gas_basics
5   7304693            DC2
6   7304693            DC2
7   7304693            DC2

you could use too a csv file or text to set your input:

data1 ="""
ContextID Name
7289972 Plasma_basics
7289973 Gas_basics
7304693 DC2
"""

df1 = pd.read_csv(pd.compat.StringIO(data1), sep='\s+',dtype={'ContextID' : str})

How to create a new column by iterating over the values of an existing column?

4 Answers4