How to select rows based on column value of csv file in python

Question

I want to select only rows that have fc_id == 2, and then delete those having duplicates. This is my input file

I have been stuck on the first step only. After that I also need an ouput file where I will get my final data with fc_id==2 and no duplicates.

I tried this:

df = pd.read_csv(r'test.csv')
df2 = df[df["fc_id"]==2]

def condi(df2):
    df3[x] = np.where(df(df2)==2, 1, 0)
    return x
var = condi(df2)
#print(var)

with open('test.csv', 'r') as in_file, open('out_test.csv', 'w') as out_file:
    seen = set()
    if var == 1:
         for line in in_file:
            if line in seen: continue

            seen.add(line)
            out_file.write(line)

I am getting an error and when I tried to print(var) it said "'DataFrame' object is not callable".

`out = pd.read_csv(r'test.csv').query('fc_id == 2').drop_duplicates()` — mozway, Aug 15 '23 at 08:26

score 1 · Accepted Answer · answered Aug 15 '23 at 08:15

1

Like this:

df = pd.read_csv(r'test.csv')
df2 = df[df['fc_id'] == 2]
df2.drop_duplicates(inplace=True)

answered Aug 15 '23 at 08:15

gtomer

5,643
1
10
21

Got it, thanks! – Fanatic Aug 15 '23 at 08:41

score 1 · Answer 2 · answered Aug 15 '23 at 08:17

1

For selecting dataframe given a certain equality condition: df=df[df['column_name'] == some_value]

In your case:

df = df[df["fc_id"]==2]

For removing duplicates, you can then use

result_df = df.drop_duplicates(keep='first')

answered Aug 15 '23 at 08:17

PatioFurnitureIsCool

156
9

Got it, thanks! – Fanatic Aug 15 '23 at 08:42

How to select rows based on column value of csv file in python

2 Answers2