2

UPDATE

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.iloc[:,3].replace(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True)
df.iloc[:,3].head(2)
0    15
1    89
Name: D, dtype: int64

df = df.astype(str)
df.iloc[:,3].replace(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True)

df.iloc[:,3].head(2)
0    <function <lambda> at 0x7fd8a6b4db18>
1    <function <lambda> at 0x7fd8a6b4db18>
Name: D, dtype: object

I got a pandas data frame and some of the rows contains numbers in some columns. I want to use the inflect library to replace only the numbers with corresponding word representation.

I think df.replace is a good fit. But how can I specify that only numbers (all the numbers which are separated by white spaces) should be replaced and pass it as argument to inflect ?.

p = inflect.engine()
df.replace(r' (\d+) ', p.number_to_words($1), regex=True, inplace=True)

Similarly, I have second dataframe, where I want to do it for a specific column, column with index 4. The column contains just 4 digit numbers only (year). How can I do it ?.

Shew
  • 1,557
  • 1
  • 21
  • 36
  • Try `df.replace(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True)` – Wiktor Stribiżew Apr 24 '18 at 07:27
  • @WiktorStribiżew it can not be used if i want to replace in a column which contains just numbers (for example my forth column contains years, and I want to replace it with words) – Shew Apr 24 '18 at 07:44
  • Do you mean you have `int`s? Then try [casting them to string, `.astype(str)`](https://stackoverflow.com/questions/17950374/converting-a-column-within-pandas-dataframe-from-int-to-string). – Wiktor Stribiżew Apr 24 '18 at 07:56
  • @WiktorStribiżew I want to convert it to equivalent words not just data type conversion. I want 1974 to be replaced by one thousand seventy four. I tried df.iloc[:,3].replace(r'(?=\^)\d+(?=\$)', lambda x: p.number_to_words(x.group()), regex=True, inplace=True) – Shew Apr 24 '18 at 08:02
  • `r'(?=\^)\d+(?=\$)'` is wrong. It must be `r'(?<!\S)\d+(?!\S)'`. What I mean, regex only works with strings, so you need to pass a string to the `re.sub` that is used behind the scenes. – Wiktor Stribiżew Apr 24 '18 at 08:02
  • @WiktorStribiżew. Oh Ok got it. Updated the question also – Shew Apr 24 '18 at 08:06
  • @WiktorStribiżew Now I get some thing like this at 0x7fd8a6b4db18> instead of the words. – Shew Apr 24 '18 at 08:12
  • Try `df.iloc[:,3].astype(str).apply(lambda row: re.sub(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), row))` – Wiktor Stribiżew Apr 24 '18 at 08:22

1 Answers1

1

Import re library, make sure your column is of type string, and use (?<!\S)\d+(?!\S) to match sequences of digits that are between start/end of string and whitespace chars. If you want to only match whole entries that are all digits, you may use ^\d+$ regex.

df.iloc[:,3].astype(str).apply(lambda row: re.sub(r'(?<!\S)\d+(?!\S)', lambda x: p.number_to_words(x.group()), row))

First, the column is cast to string with .astype(str). Then, (?<!\S)\d+(?!\S) matches in each row and the number is sent to the .number_to_words() method.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563