Well, you're already stripping out whitespace:
df_final['Name'].replace(r'\s+|\\n', ' ', regex = True, inplace = True)
- To match a newline (
\n
), you don't need that double slash as long as you're using a raw string literal (the r''
).
- Do you really want to replace
\n
with a space? I'd imagine you probably want it removed entirely. (Your example doesn't show the newlines, so it's hard to tell.)
- Spaces are not recommended around the
=
of keyword arguments. Your code will still run just fine if you break this convention, but other programmers, at least, will have a harder time reading your code.
inplace
is also not exactly recommended, and may even be deprecated in future. It seems like it would be more memory efficient, but in reality it often creates a copy under the hood anyway.
Assuming full_name
in your code is the series (column) of names, this will remove all digits, then also clear all whitespace (spaces and/or newlines) from the left and right, leaving you with just the first and last name:
df_final['Name'] = full_name.replace(r'\d+', '', regex=True).str.strip()
(That's an immediate fix, but depending on how the original data is formatted, I suspect there's probably a way to scrape your data into a dataframe that avoids this ahead of time.)