how to access values in a pandas dataframe without iterating over each row

Question

Let's say I have a Pandas Dataframe with 1000 rows and 10 columns.

There are 5 integer columns labeled i1 through i5 and 5 string columns.

How can I create a new column called DIFF which is defined as

MAX(i1,i2,i3,i4,i5) - MIN(i1,i2,i3,i4,i5)

I was having trouble using Max and Min operators because I wasn't accessing values cleanly - getting fouled up with Series. In other examples I saw online, people were doing

mydf.iloc[x]['SOME_COL']

to get a cell's value but in this example I don't want to iterate over rows, I just want to compute the new column for every row all at once.

score 1 · Accepted Answer · answered Mar 01 '17 at 00:46

1

Assuming that your numeric columns are the first 5, you can get the desired column using:

df.ix[:, 0:5].max(axis=1) - df.ix[:, 0:5].min(axis=1)

answered Mar 01 '17 at 00:46

czr

658
3
13

score 1 · Answer 2 · edited May 23 '17 at 11:53

If what you're really interested in is taking the max - min of numeric columns then you can just figure out which columns are numeric and then do max - min on them. Like this:

>> df
   i1  i2  i3  i4  i5 str_col_1 str_col_2
0   1   2   3   4   5         a         b
1   1   2   3   4   5         a         b
2   1   2   3   4   5         a         b
3   1   2   3   4   5         a         b
4   1   2   3   4   5         a         b

>> numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
>> numeric_cols = df.select_dtypes(include=numerics)
>> numeric_cols.max(axis=1) - numeric_cols.min(axis=1)

0    4
1    4
2    4
3    4
4    4
dtype: int64

I can see that being very useful! In my case, the integer columns were static and known a priori. — Mark Ginsburg, Mar 01 '17 at 02:03

how to access values in a pandas dataframe without iterating over each row

2 Answers2