I have read the following discussion and is thinking of implementing it on my code to get the index of rows where the column matches the condition. I have the following file, and I want to extract the rows for which the column of 'Jabatan' is '-' and 'Jumlah Lembar Saham' is not '-'. Here is my code:
input_csv_file = "./CSV/Officers_and_Shareholders.csv"
COLUMNS = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines='skip', names=COLUMNS)
df.fillna('', inplace=True)
NAME = 'Nama'
NUMBER_OF_SHARES = "Jumlah Lembar Saham"
TOTAL = "Total"
POSITION = "Jabatan"
pattern_shareholders = re.compile(r'[A-Z]+\s[]+\s{}[A-Z]+[,]')
shareholders_df = df[(~df['Nama'].str.startswith("NIK:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("NPWP:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("TTL:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Nomor SK") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Tanggal SK") & df[POSITION] != "-")]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = ' '.join(officers_list)
matches = pattern_shareholders.findall(officers_string)
print(matches)
But the code on the above returns every names under the 'Nama' column, such as the following:
['ALIF SASETYO,', 'ARIEF HERMAWAN,', 'ARLAN SEPTIA ANANDA RASAM,', 'CHAIRAL TANJUNG,', 'FUAD RIZAL,', 'R AGUS HARYOTO PURNOMO,', 'PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,', 'YUKKI NUGRAHAWAN HANAFI,']
So ideally, if the conditions are met, the returned value should only be like the following:
['PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,']
Since the value under the column 'Nama' returned on the above are the only ones where 'Jabatan' is '-' and 'Jumlah Lembar Saham' is not '-'. Is there any method to do this?