3

I'm trying to write a script (see below code) to check if any of the values in the 'Mobile Phone Number' column exceeds the length of 11. If there is, then print the index of this value and delete the entire record of this index from the data frame. However, the program is not doing this line of code correctly: if len(data['Mobile Phone Number']) > 11: even though the condition is met? There are two phone numbers exceeding the length of 11 that I need to delete.

import pandas as pd

data = {
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
}

df = pd.DataFrame(data)

print(df)

for i in range(len(data)):
    if len(data['Mobile Phone Number']) > 11:
        print('Number at index ', i, 'is incorrect')
        data = data.drop(['Mobile Phone Number'][i], axis=1)
    else:
        print('\nNo length of > 11 found in Mobile Phone Numbers')

And this is the output of the above code:

     Name  Mobile Phone Number
0     Tom          13805647925
1  Joseph      145792860326480
2   Krish      184629730518469
3    John          18218706491

No length of > 11 found in Mobile Phone Numbers

No length of > 11 found in Mobile Phone Numbers
accdias
  • 5,160
  • 3
  • 19
  • 31
hello
  • 67
  • 6
  • `len(data['Mobile Phone Number'])` does return how many phones numbers are in your column, not their lengths – Daweo Sep 03 '21 at 13:25
  • Your sample can be wrong because if you have a phone number with leading 0, you will lost this one because the dtype of your column is 'int' (this is the case in my country). – Corralien Sep 03 '21 at 13:44
  • 1
    Also, you are operating on your dictionary (`data`) and I guess you should be using your data frame (`df`) instead. – accdias Sep 03 '21 at 14:02
  • Are the mobile phone numbers supposed to be strings or numbers in your DataFrame? – wwii Sep 03 '21 at 14:10
  • @wwii, numbers. – hello Sep 03 '21 at 14:28
  • Does [determine length (or number of digits) of every row in column](https://stackoverflow.com/questions/39668138/determine-length-or-number-of-digits-of-every-row-in-column) answer your question? – wwii Sep 03 '21 at 14:32

4 Answers4

2

For the following Dataframe() as input:

df = pd.DataFrame({
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
})

#      Name  Mobile Phone Number
# 0     Tom          13805647925
# 1  Joseph      145792860326480
# 2   Krish      184629730518469
# 3    John          18218706491

You can try this:

df = df[df['Mobile Phone Number'].apply(lambda x: len(str(x)) <= 11)]
df

To have this output:

    Name    Mobile Phone Number
0   Tom     13805647925
3   John    18218706491

Edit: if you want show error for number > 11 you can try this:

if any(df['Mobile Phone Number'].apply(lambda x: len(str(x)) > 11)):
    print("Error! you have number > 11")

Second edit : if you want to show error massage then remove number >11 use below code:

df = pd.DataFrame({
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
})

print(df)

if any(df['Mobile Phone Number'].apply(lambda x: len(str(x)) > 11)):
    print("\n Error! you have number > 11 \n")
    df = df[df['Mobile Phone Number'].apply(lambda x: len(str(x)) <= 11)]

print(df)

output:

     Name  Mobile Phone Number
0     Tom          13805647925
1  Joseph      145792860326480
2   Krish      184629730518469
3    John          18218706491


 Error! you have number > 11 


   Name  Mobile Phone Number
0   Tom          13805647925
3  John          18218706491
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • Hi @user1740577, this is working! However, could you explain why the condition in lambda is <= 11 instead of > 11? – hello Sep 03 '21 at 13:46
  • @hello , you want delete number > 11 then you want keep `<= 11` for this I set condition `<=11` then get `true` for them and show rows that is `true` – I'mahdi Sep 03 '21 at 13:48
  • @hello if this is correct please `upvote` and read this link : https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – I'mahdi Sep 03 '21 at 13:49
  • what if I want to print the error message as well if > 11? – hello Sep 03 '21 at 14:16
  • @hello, i edit answer and add this line : `if any(df['Mobile Phone Number'].apply(lambda x: len(str(x)) > 11))` is this you answer? – I'mahdi Sep 03 '21 at 14:27
  • Hi @user1740577, thank you for the edit! Did it print the error message for you? It didn't work for me – hello Sep 03 '21 at 15:24
  • @hello, maybe run code on deleted df, see this link : https://onecompiler.com/python/3xadxemez – I'mahdi Sep 03 '21 at 16:21
  • this looks good! Is it also possible to remove the numbers which have length > 11 after the check in the 'if' statement? – hello Sep 03 '21 at 17:00
  • @hello, welcome dude. if this answer help you `upvote` and if this correct read this link : https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work. i will answer you tomorrow. I'm going to bed now :)))) – I'mahdi Sep 03 '21 at 17:34
  • @hello, i edit the answer and add `second edit` see this new edit and if this is your answer please `accept` answer – I'mahdi Sep 04 '21 at 05:32
1

You can try this:

moblie_longer_than_11 = df[df["Mobile Phone Number"].astype(str)\
                                                    .apply(len).gt(11)].index

print(df.loc[set(df.index).difference(moblie_longer_than_11)])

Output:

    Name    Mobile Phone Number
0   Tom     13805647925
3   John    18218706491
ashkangh
  • 1,594
  • 1
  • 6
  • 9
1

This is a combination of previous answers to give the results expected by OP. Credit goes to the other authors.

import pandas as pd

df = pd.DataFrame({
    'Name': [
        'Tom',
        'Joseph',
        'Krish',
        'John'
    ],
    'Mobile Phone Number': [
        13805647925,
        145792860326480,
        184629730518469,
        18218706491
    ]
})

invalid_phones = df['Mobile Phone Number'].astype(str).apply(len).gt(11)

if invalid_phones.any():
    for _ in df[invalid_phones].index:
        print(f'Number at index {_} is incorrect')
else:
    print('No length of > 11 found in Mobile Phone Numbers')

The code above will result in the following output:

Number at index 1 is incorrect
Number at index 2 is incorrect

To remove the invalid phones from df you can use:

df = df.loc[set(df.index).difference(df[invalid_phones].index)]

or:

df = df.drop(df[invalid_phones].index)  

or even better:

df.drop(df[invalid_phones].index, inplace=True)  

That will result in the following:

print(df)
   Name  Mobile Phone Number
0   Tom          13805647925
3  John          18218706491
accdias
  • 5,160
  • 3
  • 19
  • 31
  • Hi @accdias, thanks for the update! Is it also possible to remove the incorrect ones after checking it? – hello Sep 03 '21 at 15:06
  • Sure!. I will update the answer and append that to the code. – accdias Sep 03 '21 at 15:19
  • did you get a warning message for the line of code to remove invalid phones? - 'UserWarning: Boolean Series key will be reindexed to match DataFrame index.' – hello Sep 03 '21 at 16:25
  • No warnings with the sample data you provided. – accdias Sep 03 '21 at 16:29
  • ok no worries. I shall post a new question on StackOverflow for the error message I guess. Thanks for the help! – hello Sep 03 '21 at 16:50
  • There is a thread for that already: check [here](https://stackoverflow.com/questions/41710789/boolean-series-key-will-be-reindexed-to-match-dataframe-index). – accdias Sep 03 '21 at 16:57
0

I believe in your case you can just compare the numbers.

mask = df['Mobile Phone Number'] >= 1e11

if mask.any():
    for i in df[mask].index:
        print('Number at index ', i, 'is incorrect')
else:
    print('\nNo length of > 11 found in Mobile Phone Numbers')
Alexander Volkovsky
  • 2,588
  • 7
  • 13