UPDATED
Problem 1: I have a data set, where a lot of values are NaN
. Using main.loc[main.isna().sum(axis=1) >= 2]
outputs to:
ID: GNDR COUNTRY ... BIKE CAR PBLC
1 0 NaN ... NaN NaN NaN
1 0 NaN ... NaN NaN NaN
16 1 UK ... 123 0 10232
Surely, row 0 and 1 should be dropped?
Problem 2: As example, if my ID is greater than 1 as shown above, this means that this person has entered data 16 times. Thus, I want to average this, such that people who only entered data once does not show as outliers to my perceptron later on. My thought was to iteratively average all rows with ID greater than 1 whilst loading data into my DataFrame.
SAMPLE CODE:
df_2 = pandas.read_csv('logs.csv', names=colnames_df_2, skiprows=[0])
df_2['ID']=df_2['ID'].apply(str)
main = df_1.merge(df_2, how='left', on='msno')
main.loc[main.isna().sum(axis=1) >= 2]
print(main)