0

How do I find how many rows are returned in a subset of a Pandas DataFrame when I'm selecting a column?

In subsetting a Pandas DataFrame and specifying a column, if the subset has more than one row, a Dataframe is returned, but if the subset returns only one row, it returns the value of the subset and I can't get the length of that.

>>> df1 = pd.DataFrame({'A':['A1','A2','A1'],'B':['B1','B2','B3']})
>>> df2 = df1.set_index('A')
>>> df3 = df1.iloc[:2,].set_index('A')
>>> df2
     B
A
A1  B1
A2  B2
A1  B3
>>> df3
     B
A
A1  B1
A2  B2
>>> df2.loc['A1','B'].shape
(2,)
>>> df3.loc['A1','B'].shape
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'shape'

This is because Pandas returns a pandas object if there is more than one row, and a scalar if it has only one row.

>>> df2.loc['A1','B']
A
A1    B1
A1    B3
Name: B, dtype: object
>>> df3.loc['A1','B']
'B1'
jpp
  • 159,742
  • 34
  • 281
  • 339
KClem
  • 31
  • 1
  • 5

2 Answers2

1

Use square brackets to denote a list of indices:

print(df3.loc[['A1'], 'B'].shape)
# (1,)

This indicates to pandas that you want to output a pd.Series object.

jpp
  • 159,742
  • 34
  • 281
  • 339
0

Ah.. Pandas selecting by label sometimes return series, sometimes returns dataframe

The key is to pass the filter criteria as a list:

>>> df3.loc[['A1'],'B'].size
1
KClem
  • 31
  • 1
  • 5