I was wondering if we can count how often 0-9 repeat in leftmost digit of index
of certain panda dataframe :
A B C
0 -56.343656 NaN -418.540483
10 -87.577880 -16.061497 NaN
20 NaN -15.337254 NaN
40 -67.462841 NaN -431.924830
50 -63.377158 -28.260790 NaN
60 NaN -22.996095 NaN
130 11.569845 NaN -307.034737
180 11.398947 -1.793530 NaN
I've extracted the indexes of those columns have nan including leading 0:
000
010
020
040
050
060
130
180
and try to store extracted indexes in csv file. then I even tried to put them in data frame again base this principle leftmost digit could be [0-7] the 2nd leftmost digit could be [0-59] and the last one could be [0-9999] and store in in csv file to further process on just 'section'
column which represents leftmost digit column.
My scripts are following:
import numpy as np
import pandas as pd
df = pd.read_csv('D:\SOF.TXT', header=None)
id_set = df[df.index % 4 == 0].astype('int').values
A = df[df.index % 4 == 1].values
B = df[df.index % 4 == 2].values
C = df[df.index % 4 == 3].values
data = {'A': A[:,0], 'B': B[:,0], 'C': C[:,0]}
#main_data contains all the data
main_data = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
main_data[np.isinf(main_data)] = np.nan # convert inf to nan
main_data_nan = main_data[main_data.isnull().any(axis=1)] # extract sub data frame
print(main_data_nan)
# to fix 3 digits of index in start
new_index = [str(x).zfill(3) for x in main_data_nan.index]
main_data_nan.index = new_index
#print all data includes nan values in .csv file
main_data_nan.to_csv('nan_data.csv', na_rep='NaN') # export
#print just especial column that includes nan values in .csv file including id_set or indexes
main_data_nan['C'].to_csv('nan_datatemp.csv', na_rep='NaN')
#print all id_set which is index in data frame has nan values
for i in range(len(main_data_nan)):
print (main_data_nan.index[i])
dff = pd.read_csv("D:\nan_datatemp.csv")
cycle, section, cell = [], [], []
for i in range(9999):
for j in range(8):
for k in range(60):
cycle.append(i)
section.append(j)
cell.append(k)
dfff = {'Section':section, 'Cell':cell, 'Cycle':cycle}
dffff = pd.DataFrame(dfff, columns=['Section','Cell', 'Cycle'], index = id_set[:,0])
dffff.to_csv('exit_id_det.csv', encoding='utf-8', index=False)
I'm not sure the answer of here could be fulfill my answer by applying (df==X).sum()
on leftmost digit index like:
(df==0).sum()
(df==1).sum()
(df==2).sum()
(df==3).sum()
(df==4).sum()
(df==5).sum()
(df==6).sum()
(df==7).sum()
and even using main_data_nan.isnull().sum().sum()
computing percentage of frequency occurrence of them
My desire results should be like:
index ----> frequency ---> percentage
0
00 ----> 0
: 6 times ----> 0
: 75% in total
0
10 ----> 1
: 2 times ----> 1
: 25% in total
0
20 ----> 2: 0 times ----> 2: 0% in total
0
40 ----> 3: 0 times ----> 3: 0% in total
0
50 ----> 4: 0 times ----> 4: 0% in total
0
60 ----> 5: 0 times ----> 5: 0% in total
1
30 ----> 6: 0 times ----> 6: 0% in total
1
80 ----> 7: 0 times ----> 7: 0% in total
Here below is my dataset sample: dataset sample DL link