There are many methods for creating new columns in Pandas (I may have missed some in my examples so please let me know if there are others and I will include here) and I wanted to figure out when is the best time to use each method. Obviously some methods are better in certain situations compared to others but I want to evaluate it from a holistic view looking at efficiency, readability, and usefulness.
I'm primarily concerned with the first three but included other ways simply to show it's possible with different approaches. Here's your sample dataframe
:
df = pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
Most commonly known way is to name a new column such as df['c']
and use apply
:
df['c'] = df['a'].apply(lambda x: x * 2)
df
a b c
0 1 4 2
1 2 5 4
2 3 6 6
Using assign
can accomplish the same thing:
df = df.assign(c = lambda x: x['a'] * 2)
df
a b c
0 1 4 2
1 2 5 4
2 3 6 6
Updated via @roganjosh:
df['c'] = df['a'] * 2
df
a b c
0 1 4 2
1 2 5 4
2 3 6 6
Using map
(definitely not as efficient as apply
):
df['c'] = df['a'].map(lambda x: x * 2)
df
a b c
0 1 4 2
1 2 5 4
2 3 6 6
Creating a new pd.series
and then concat
to bring it into the dataframe
:
c = pd.Series(df['a'] * 2).rename("c")
df = pd.concat([df,c], axis = 1)
df
a b c
0 1 4 2
1 2 5 4
2 3 6 6
Using join
:
df.join(c)
a b c
0 1 4 2
1 2 5 4
2 3 6 6