0

I have a column in a dataset df which contains strings like these

Webs
https://www.mhouse.com/107462464135489/posts/please-lets-be-guidedun-is-where-the-code/142970213918047/
https://www.msed.com/IKONINIBWANASEEDMARCH2020.html
https://www.msed.com/
https://carrice.com/jen/stat/1241025420562178050?lang=en

...

I would like to determine the count and the percentage of numbers within them; so, for instance

Count      Percentage
15         (and the percentage compared to the length of the string)
4          ...
0          ...
19         ...

If I am not wrong I'd use a combination of is digit for determining the number of digits in the strings and len() for determining the length of the string, then the percentage.

E_net4
  • 27,810
  • 13
  • 101
  • 139
LdM
  • 674
  • 7
  • 23

1 Answers1

6

You can count the number of digits in a string using Series.str.count with a regular expression. Additionally, you can get the length of each string in a series with Series.str.len(). Once you do that, calculating the percentage is straight forward!

df["digit_count"] = df["Webs"].str.count("\d")
df["total_characters"] = df["Webs"].str.len()
df["digit_percentage"] = df["digit_count"] / df["total_characters"] * 100

print(df)
                                                Webs  digit_count  total_characters  digit_percentage
0  https://www.mhouse.com/107462464135489/posts/p...           30               103         29.126214
1  https://www.msed.com/IKONINIBWANASEEDMARCH2020...            4                51          7.843137
2                              https://www.msed.com/            0                21          0.000000
3  https://carrice.com/jen/stat/12410254205621780...           19                56         33.928571
Cameron Riddell
  • 10,942
  • 9
  • 19
  • 1
    didnt know i could do `count('\d')` . Does this mean i can use any regex to check for counts? Have to try it out – Joe Ferndz Feb 16 '21 at 20:07
  • 1
    Yep! `count` works with any regex, so you can simply count the pattern matches as a more readable way of doing some combination of `.str.extractall(..., expand=False)` with `.str.len()` – Cameron Riddell Feb 16 '21 at 20:20