Remove digits from a list of strings in pandas column

Question

I have this pandas dataframe

0  Tokens 
1: 'rice', 'XXX', '250g'
2: 'beer', 'XXX', '750cc'

All tokens here, 'rice', 'XXX' and '250g' are in the same list of strings, also in the same column

I want to remove the digits, and because it with another words, the digits cannot be removed.

I have tried this code:

def remove_digits(tokens):
    """
    Remove digits from a string
    """
    return [''.join([i for i in tokens if not i.isdigit()])]

df["Tokens"] = df.Tokens.apply(remove_digits)
df.head()

but it only joined the strings, and I clearly do not want to do that.

My desired output:

0  Tokens
1: 'rice' 'XXX' 'g'
2: 'beer', 'XXX', 'cc'

What is `Tokens` here? Could you provide the sentences to construct the df? — Norhther, Jul 11 '21 at 20:03
I think this answers your question by using regular expressions:https://stackoverflow.com/questions/40178364/using-regex-to-remove-digits-from-string — braulio, Jul 11 '21 at 20:20
In your suggested solution, you are passing a list `Tokens` to your function, you need to then loop to each caracther in the string `i` before applying `isdigit()` — braulio, Jul 11 '21 at 20:24

Alex · Answer 1 · 2021-07-12T13:30:30.273

This is possible using pandas methods, which are vectorised so more efficient that looping.

import pandas as pd

df = pd.DataFrame({"Tokens": [["rice", "XXX", "250g"], ["beer", "XXX", "750cc"]]})

col = "Tokens"
df[col] = (
    df[col]
    .explode()
    .str.replace("\d+", "", regex=True)
    .groupby(level=0)
    .agg(list)
)
#             Tokens
# 0   [rice, XXX, g]
# 1  [beer, XXX, cc]

Here we use:

pandas.Series.explode to convert the Series of lists into rows
pandas.Series.str.replace to replace occurrences of \d (number 0-9) with "" (nothing)
pandas.Series.groupby to group the Series by index (level=0) and put them back into lists (.agg(list))

score 0 · Accepted Answer · answered Jul 11 '21 at 20:15

Here's a simple solution -

df = pd.DataFrame({'Tokens':[['rice', 'XXX', '250g'], 
                             ['beer', 'XXX', '750cc']]})

def remove_digits_from_string(s):
    return ''.join([x for x in s if not x.isdigit()])

def remove_digits(l):
    return [remove_digits_from_string(s) for s in l]

df["Tokens"] = df.Tokens.apply(remove_digits)

Carmoreno · Answer 3 · 2021-07-11T21:52:52.597

0

You can use to_list + re.sub in order to update your original dataframe.

import re

for index, lst in enumerate(df['Tokens'].to_list()):
  lst = [re.sub('\d+', '', i) for i in lst]
  df.loc[index, 'Tokens'] = lst

print(df)

Output:

    Tokens
0   [rice, XXX, g]
1   [beer, XXX, cc]

edited Jul 11 '21 at 21:52

answered Jul 11 '21 at 21:42

Carmoreno

1,271
17
29

Remove digits from a list of strings in pandas column

3 Answers3