-1

I have a dataset where I have to drop rows with multiple columns. I tried this, but do not know how to do with multiple values

import pandas as pd
df = pd.read_csv("data.csv")
new_df = df[df.location == 'New York' ]
new_df.count()

I also tried another method, but do not know, how to do with multiple values:

import pandas as pd
df = pd.read_csv("data.csv")
df.drop(df[df['location '] == 'New York'].index, inplace = True) 

I have delete rows, with values new york, boston, Austin and keep other locations remaining.

Also, I have replace value of a column if San Francisco then change value to 1, if Miami change to 2, so all values in location, should be replaced

  • Does this answer your question? [How to filter Pandas dataframe using 'in' and 'not in' like in SQL](https://stackoverflow.com/questions/19960077/how-to-filter-pandas-dataframe-using-in-and-not-in-like-in-sql) – mcskinner Apr 19 '20 at 00:59

3 Answers3

1

You can use query method and variable with all cities you want to filter

np.random.seed(0)
cities = ['New York', 'Chicago', 'Miami']
data = pd.DataFrame(dict(cities = np.random.choice(cities, 10),
                          values = np.random.choice(10,10)))

data.cities.unique() # array(['New York', 'Chicago', 'Miami'], dtype=object)
filter = ['New York', 'Chicago']
data_filtered = data.query('cities not in @filter').copy()
data_filtered.cities.unique() # array(['Miami'], dtype=object)

For the values, you can manually set values

data_filtered.loc[data_filtered.cities == 'Miami', ['values']] =2
jcaliz
  • 3,891
  • 2
  • 9
  • 13
0

I don't quite follow what you mean by dropping rows with multiple columns, but to check for multiple values you could use: new_df = df[df.location in ['New York', 'Boston']]

Ralvi Isufaj
  • 442
  • 2
  • 9
0

You can try:

# Drop the rows with location "New York", "Boston", "Austin" (1)
df = df[~df["location"].isin(["New York", "Boston", "Austin"])]

# Replace locations with numbers: (2)
loc_map = {"San Francisco": 1, "Miami": 2, ...}
df["location"] = df["location"].map(loc_map)

For step (2), in case you have many values, you can create loc_map automatically by:

loc_map = {df.location.unique()[i]: i+1 for i in range(len(df.location.unique()))}

Hope this helps.

Hoa Nguyen
  • 470
  • 6
  • 15