I have tried a number of things and just can't seem to get something that works. Basically I have an XLSX file that has 3 columns (Name, Email, Phone Number) and then a number of rows. The phone numbers start with different country codes "+1.xxxxxxxxxx" "+90.xxxxxxxxxx" "+34." etc. The first thing I want to do is delete all rows that don't start with "+1." and would like to further filter by deleting obviously fake and/or incomplete phone number, for instance, sometimes people use '+1.5555555555' or don't give me complete numbers, so I would get '+1.12345678' (not a full 10 digit number, 13 if you count '+1.'). Then after all that is done, write it to .csv.
My code thus far is as follows:
import pandas as pd
xl_file = pd.ExcelFile('testexcel.xlsx')
df = xl_file.parse('Sheet 1', index_col='Name', na_values=['NA'])
#df.drop(df.columns[[0]], axis=1, inplace=True)
df = df[df['Phone'] != '+1.*']
df.to_csv('testingpandas.csv')
The #df.drop... component was just me playing with df.dop and deleting columns as a whole. The next line is my attempt at using this answer to maybe get the result I wanted, but I just can't quite find what is right. And the '*' at the end was what I thought was a wildcard.
Would it be best to filter the '+1's and then write something that then deletes rows without a total string count of 13 in that column?