1

This is a very general question, I'm asking for solutions to look into for the following situations:

I often find myself creating an extra column in a dataframe, and I want to use something like:

df['new_col'] = df['old_col_1']+df['old_col_2']

But unless the operation is incredibly simple, this gives a "TypeError: cannot convert the series to [class 'whatever']" and I have to use a clunkier method. Example:

df = pd.DataFrame({'Year':[2018,2017,2016,2017,2016,2018,2018],'Month':[1,1,1,2,2,2,3],
                   'Value':[521,352,32,125,662,123,621]})

I want a Date column, and I end up doing:

from datetime import datetime as dt
df['Date'] = None
for i in df.index:
    df1.loc[i,'Date'] = dt(df1.loc[i,'Year'],df1.loc[i,'variable'],1)

In other situations, I find myself doing:

datelist = []
for i in df.index:
    datelist.append(dt(df1.loc[i,'Year'],df1.loc[i,'variable'],1))
df['Date'] = datelist

Obviously this is just an example, there are many situations in which I end up using either method. Am I right in thinking that these methods are not pythonic, and what is a better way for generating slightly complicated columns based on other columns?

Shanteshwar Inde
  • 1,438
  • 4
  • 17
  • 27
Jim Eisenberg
  • 1,490
  • 1
  • 9
  • 17

1 Answers1

5

I think principe df['new_col'] = df['old_col_1']+df['old_col_2'] is good, because vectorized.

It depends of data, how handle it. E.g. here is possible convert columns to strings and apply to_datetime:

df['Date'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Month'].astype(str), 
                            format='%Y-%m')
print (df)

   Year  Month  Value       Date
0  2018      1    521 2018-01-01
1  2017      1    352 2017-01-01
2  2016      1     32 2016-01-01
3  2017      2    125 2017-02-01
4  2016      2    662 2016-02-01
5  2018      2    123 2018-02-01
6  2018      3    621 2018-03-01

General order of precedence for performance of various operations
For loops with pandas

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252