pandas convert numbers to words in every rows and one specific column

Question

UPDATE

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.iloc[:,3].replace(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True)
df.iloc[:,3].head(2)
0    15
1    89
Name: D, dtype: int64

df = df.astype(str)
df.iloc[:,3].replace(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True)

df.iloc[:,3].head(2)
0    <function <lambda> at 0x7fd8a6b4db18>
1    <function <lambda> at 0x7fd8a6b4db18>
Name: D, dtype: object

I got a pandas data frame and some of the rows contains numbers in some columns. I want to use the inflect library to replace only the numbers with corresponding word representation.

I think df.replace is a good fit. But how can I specify that only numbers (all the numbers which are separated by white spaces) should be replaced and pass it as argument to inflect ?.

p = inflect.engine()
df.replace(r' (\d+) ', p.number_to_words($1), regex=True, inplace=True)

Similarly, I have second dataframe, where I want to do it for a specific column, column with index 4. The column contains just 4 digit numbers only (year). How can I do it ?.

Try `df.replace(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True)` — Wiktor Stribiżew, Apr 24 '18 at 07:27
@WiktorStribiżew it can not be used if i want to replace in a column which contains just numbers (for example my forth column contains years, and I want to replace it with words) — Shew, Apr 24 '18 at 07:44
Do you mean you have `int`s? Then try [casting them to string, `.astype(str)`](https://stackoverflow.com/questions/17950374/converting-a-column-within-pandas-dataframe-from-int-to-string). — Wiktor Stribiżew, Apr 24 '18 at 07:56
@WiktorStribiżew I want to convert it to equivalent words not just data type conversion. I want 1974 to be replaced by one thousand seventy four. I tried df.iloc[:,3].replace(r'(?=\^)\d+(?=\$)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True) — Shew, Apr 24 '18 at 08:02
`r'(?=\^)\d+(?=\$)'` is wrong. It must be `r'(?<!\S)\d+(?!\S)'`. What I mean, regex only works with strings, so you need to pass a string to the `re.sub` that is used behind the scenes. — Wiktor Stribiżew, Apr 24 '18 at 08:02
@WiktorStribiżew Now I get some thing like this at 0x7fd8a6b4db18> instead of the words. — Shew, Apr 24 '18 at 08:12
Try `df.iloc[:,3].astype(str).apply(lambda row: re.sub(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), row))` — Wiktor Stribiżew, Apr 24 '18 at 08:22

score 1 · Accepted Answer · answered Apr 24 '18 at 08:28

Import re library, make sure your column is of type string, and use (?<!\S)\d+(?!\S) to match sequences of digits that are between start/end of string and whitespace chars. If you want to only match whole entries that are all digits, you may use ^\d+$ regex.

df.iloc[:,3].astype(str).apply(lambda row: re.sub(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), row))

First, the column is cast to string with .astype(str). Then, (?<!\S)\d+(?!\S) matches in each row and the number is sent to the .number_to_words() method.

pandas convert numbers to words in every rows and one specific column

1 Answers1