Numpy:
import numpy as np
nparr = np.array([[1, 5],[2,6], [3, 7]])
print(nparr)
print(nparr[0]) #first choose the row
print(nparr[0][1]) #second choose the column
gives the output as expected:
[[1 5]
[2 6]
[3 7]]
[1 5]
5
Pandas:
df = pd.DataFrame({
'a': [1, 2, 3],
'b': [5, 6, 7]
})
print(df)
print(df['a']) #first choose the column !!!
print(df['a'][1]) #second choose the row !!!
gives the following output:
a b
0 1 5
1 2 6
2 3 7
0 1
1 2
2 3
Name: a, dtype: int64
2
What is the fundamental reason for changing the default ordering of "indexes" in Pandas dataframe to be column first? What is the benefit we get for this loss of consistency/intuitiveness?
Of course, if I use the iloc
function we can code it similar to Numpy array indexing:
print(df)
print(df.iloc[0]) # first choose the row
print(df.iloc[0][1]) # second choose the column
a b
0 1 5
1 2 6
2 3 7
a 1
b 5
Name: 0, dtype: int64
5