-1

In my DF there are values like Ø§Ù„ÙØ¬ÙŠØ±Ø© in different columns. How can I remove such values? I am reading the data from an excel file. So on reading, if we could do something then that will be great.

Also, I have some values like Battery ÁÁÁ so I want it to be Battery, So how can I delete these non-English characters but keep other content?

Derik002
  • 145
  • 13

2 Answers2

1

You can use regex to remove designated characters from your strings:

import re
import pandas as pd

records = [{'name':'Foo Ø§Ù„ÙØ¬ÙŠØ±Ø©'}, {'name':'Battery ÁÁÁ'}]
df = pd.DataFrame.from_records(records)


# Allow alpha numeric and spaces (add additional characters as needed)
pattern = re.compile('[^A-z0-9 ]+')
def clean_text(string):
    return pattern.search('', string)

# Apply to your df
df['clean_name'] = df['name'].apply(clean_text)

                name clean_name
0  Foo Ø§Ù„ÙØ¬ÙŠØ±Ø©       Foo
1        Battery ÁÁÁ   Battery

For more solutions, you can read this SO Q: Python, remove all non-alphabet chars from string

Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
0

You can use python split method to do that or you can lambda function:

df[column_name] = df[column_name].apply(lambda column_name : column_name[start:stop])
#df['location'] = df['location'].apply(lambda location:location[0:4])

Split Method

df[column_name] = df[column_name].apply(lambda column_name: column_name.split('')[0])
DaveL17
  • 1,673
  • 7
  • 24
  • 38