2

I have read the following discussion and is thinking of implementing it on my code to get the index of rows where the column matches the condition. I have the following file, and I want to extract the rows for which the column of 'Jabatan' is '-' and 'Jumlah Lembar Saham' is not '-'. Here is my code:

input_csv_file = "./CSV/Officers_and_Shareholders.csv"
COLUMNS = ['Nama', 'Jabatan', 'Alamat', 'Klasifikasi Saham', 'Jumlah Lembar Saham', 'Total']
df = pd.read_csv(input_csv_file, skiprows=11, on_bad_lines='skip', names=COLUMNS)
df.fillna('', inplace=True)

NAME = 'Nama'
NUMBER_OF_SHARES = "Jumlah Lembar Saham"
TOTAL = "Total"
POSITION = "Jabatan"

pattern_shareholders = re.compile(r'[A-Z]+\s[]+\s{}[A-Z]+[,]')
shareholders_df = df[(~df['Nama'].str.startswith("NIK:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("NPWP:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("TTL:") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Nomor SK") & df[POSITION] != "-")]
shareholders_df = df[(~df['Nama'].str.startswith("Tanggal SK") & df[POSITION] != "-")]
shareholders_df = df[df[POSITION] == True].index.tolist()
shareholders_list = df[NAME].tolist()
shareholders_string = ' '.join(officers_list)
matches = pattern_shareholders.findall(officers_string)

print(matches)

But the code on the above returns every names under the 'Nama' column, such as the following:

['ALIF SASETYO,', 'ARIEF HERMAWAN,', 'ARLAN SEPTIA ANANDA RASAM,', 'CHAIRAL TANJUNG,', 'FUAD RIZAL,', 'R AGUS HARYOTO PURNOMO,', 'PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,', 'YUKKI NUGRAHAWAN HANAFI,']

So ideally, if the conditions are met, the returned value should only be like the following:

['PT CTCORP INFRASTRUKTUR D INDONESIA,', 'I E S M PT INTRERPORT PATIMBAN AGUNG,', 'PT PATIMBAN MAJU BERSAMA,', 'PT TERMINAL PETIKEMAS SURABAYA,']

Since the value under the column 'Nama' returned on the above are the only ones where 'Jabatan' is '-' and 'Jumlah Lembar Saham' is not '-'. Is there any method to do this?

htm_01
  • 115
  • 6

1 Answers1

1

Sounds like you need df.loc

df.loc[(df['col1'] == value) & (df['col2'] < value)]

So in your case

print(df.loc[(df['Jabatan'] == '-') & (df['Jumlah Lembar Saham'] != '-')]['Nama'])
Brandon Johnson
  • 172
  • 1
  • 6