0

Sorry I'm a newbie at this, but I'm trying to filter a dataframe based on the the values within one column. I have a set of usernames and I want to filter usernames with:

1.duplicate usernames

2.usernames with the following special characters: [ : ; | = , + * ? < >

3.excess spaces within usernames (ex. "john doe ")

I'm not sure how to set parameters or how to even get started.

Hixx
  • 11
  • 1
  • 1
    Not a duplicate at all. The question is about different technics of selection. – Leonid Mednikov May 23 '19 at 17:57
  • Ideally you should have googled and find the answer easily. However, you can take following code snippet import pandas as pd import re def regex_filter(val): regex = re.compile('[:;|=,+*?<>') if val: mo = re.search(regex,val) if mo: return False else: return True else: return True df = pd.read_csv("data.csv") df_filtered = df[df['user_id'].apply(regex_filter)] – Priya Jain May 23 '19 at 18:57
  • Not able to add this as code. you can copy code and use – Priya Jain May 23 '19 at 18:58
  • check out all the available `str` methods in pandas: https://pandas.pydata.org/pandas-docs/stable/reference/series.html#api-series-str – acushner May 23 '19 at 19:07

0 Answers0