0

I am working with an excel file with has a column called 'HEIGHT'.

I would like to return the number of values in this column.

There are blank values in this column, so I would only like the count of actual numbers.

I have tried df['HEIGHT'] however it returns all the rows even if they don't have a value.

I also would like to know how to delete all the rows that don't have a value in the 'HEIGHT' column.

blkpingu
  • 1,556
  • 1
  • 18
  • 41
Jhangir Awan
  • 17
  • 1
  • 7
  • What do you mean with 'don't have' are you getting `NaN` as inputs or blanks? – Celius Stingher Jan 21 '20 at 18:46
  • 1
    Does this answer your question? [Remove NaN/NULL columns in a Pandas dataframe?](https://stackoverflow.com/questions/10857924/remove-nan-null-columns-in-a-pandas-dataframe) – blkpingu Jan 21 '20 at 18:47
  • @CeliusStingher I am getting NaN as inputs, I guess I could filter find the number of NaN values and subtract it from the length of the array to get the total number of values, but that seems inefficient. – Jhangir Awan Jan 21 '20 at 18:50
  • I am not upvoting your question due to not providing a minimum reproducible example. – Celius Stingher Jan 21 '20 at 18:54

1 Answers1

1

I decided to adress two different situations, one in which you are getting NaN as values for the column height and another one when you get a blank space.

import pandas as pd
import numpy as np

Situation 1:

data = {'Height':[100,110,104,np.NaN,200,np.NaN],'Name':['Franky','Coby','Robin','Kanjuro','Tom','Ace']}
df = pd.DataFrame(data)

Solution 1:

df = df.dropna(subset=['Height'],axis=0)
values = df['Height'].tolist()
print(values)

Situation 2:

data = {'Height':[100,110,104,'',200,''],'Name':['Franky','Coby','Robin','Kanjuro','Tom','Ace']}
df = pd.DataFrame(data)

Solution 2:

df['Height'] = pd.to_numeric(df['Height'],errors='coerce')
df = df.dropna(subset=['Height'],axis=0)
values = df['Height'].tolist()
print(values)

Both outputs are:

[100.0, 110.0, 104.0, 200.0]
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • I just tried this on my data frames, unfortunately it just deleted the whole data frame because there are NaN values in other columns. But I just wanted to delete the rows that have NaN values in the "HEIGHT" column. – Jhangir Awan Jan 21 '20 at 19:02
  • Sure, you just need to use the parameter `subset=['Height']` edited in answer aswell. – Celius Stingher Jan 21 '20 at 19:04
  • 1
    Thank's for this, I just realized I could also do `df = df[np.isfinite(df['HEIGHT'])]` – Jhangir Awan Jan 21 '20 at 19:06