I have data as follows:
import pandas as pd
url_cities="https://population.un.org/wup/Download/Files/WUP2018-F12-Cities_Over_300K.xls"
df_cities = pd.read_excel(url_cities)
print(df_cities.iloc[0:20,])
The column names can be found in row 15, but I would like this row number to be automatically determined. I thought the best way would be to take the first row for which the values are non-Nan for less than 10 items.
I combined this answer, to find this answer to do the following:
amount_Nan = df_cities.shape[1] - df_cities.count(axis=1)
# OR df.isnull().sum(axis=1).tolist()
print(amount_Nan)
col_names_index = next(i for i in amount_Nan if i < 3)
print(col_names_index)
df_cities.columns = df_cities.iloc[col_names_index]
The problem is that col_names_index
keeps returning 0
, while it should be 15
. I think it is because amount_Nan
returns rows and columns because of which next(i for i in amount_Nan if i < 3)
works differently than expected.
The thing is that I do not really understand why. Can anyone help?