First, OP misunderstood the rows and columns in his/her dataframe.
But the acutal output considers rows that are found in both dataframes.(the only common row element 'y')
OP thought the label y
is for row. However, y
is a column name.
df1 = pd.DataFrame(
{"x":[1, 2, 3, 4, 5], # <-- looks like row x but actually col x
"y":[3, 4, 5, 6, 7]}, # <-- looks like row y but actually col y
index=['a', 'b', 'c', 'd', 'e'])
print(df1)
\col x y
index or row\
a 1 3 | a
b 2 4 v x
c 3 5 r i
d 4 6 o s
e 5 7 w 0
-> column
a x i s 1
It is very easy to be misled since in the dictionary, it looks like y
and x
are two rows.
If you generate df1
from a list of list, it should be more intuitive:
df1 = pd.DataFrame([[1,3],
[2,4],
[3,5],
[4,6],
[5,7]],
index=['a', 'b', 'c', 'd', 'e'], columns=["x", "y"])
So back to the problem, concat
is a shorthand for concatenate (means to link together in a series or chain on this way [source]) Performing concat
along axis 0 means to linking two objects along axis 0.
1
1 <-- series 1
1
^ ^ ^
| | | 1
c a a 1
o l x 1
n o i gives you 2
c n s 2
a g 0 2
t | |
| V V
v
2
2 <--- series 2
2
So... think you have the feeling now. What about sum
function in pandas? What does sum(axis=0)
means?
Suppose data looks like
1 2
1 2
1 2
Maybe...summing along axis 0, you may guess. Yes!!
^ ^ ^
| | |
s a a
u l x
m o i gives you two values 3 6 !
| n s
v g 0
| |
V V
What about dropna
? Suppose you have data
1 2 NaN
NaN 3 5
2 4 6
and you only want to keep
2
3
4
On the documentation, it says Return object with labels on given axis omitted where alternately any or all of the data are missing
Should you put dropna(axis=0)
or dropna(axis=1)
? Think about it and try it out with
df = pd.DataFrame([[1, 2, np.nan],
[np.nan, 3, 5],
[2, 4, 6]])
# df.dropna(axis=0) or df.dropna(axis=1) ?
Hint: think about the word along.