I have a dataset of tweets that contains tweets mainly from English but also have several tweets in Indian Languages (such as Punjabi, Hindi, Tamil etc.). I want to keep only English language tweets and remove rows with different language tweets. I tried this [https://stackoverflow.com/questions/67786493/pandas-dataframe-filter-out-rows-with-non-english-text] and it worked on the sample dataset. However, when I tried it on my dataset it showed error:
LangDetectException: No features in text.
Also, I have already checked other question [https://stackoverflow.com/questions/69804094/drop-non-english-rows-pandasand] where the accepted answer talks about this error and mentioned that empty rows might be the reason for this error, so I already cleaned my dataset to remove all the empty rows.
Simple code which worked on sample data but not on original data:
from langdetect import detect
import pandas as pd
df = pd.read_csv('Sample.csv')
df_new = df[df.text.apply(detect).eq('en')]
print('New df is: ', df_new)
How can I check which row is producing error?
Thanks in Advance!