dataframe string type cannot use replace method

Question

df = pd.DataFrame({'a': ['asdf']}, dtype="string")
df["a"].replace({"a":"b"}, regex=True)

not chagend

df = pd.DataFrame({'a': ['asdf']}, dtype="object")
df["a"].replace({"a":"b"}, regex=True)

changed

I want to convert string value to other. but, if I use type string, I could not use replace method. how to chage string type data? should I use object type?

I tried the first code and it worked; changed the a to b. pandas version 1.3 — sammywemmy, Aug 26 '21 at 06:48
afternoon_drinker, you need to use `.str` method on string `dtype`, see the answer for more details. — Karn Kumar, Aug 26 '21 at 08:30

Karn Kumar · Accepted Answer · 2021-08-26T10:19:25.777

If you see the difference by checking with df.dtypes it's evident that you r datatype is ultimately is an object but column is only string hence you need to apply pandas.Series.str.replace to get your results.

However, when you choose dtype="object" your both dtype and column data remains object thus you don't need to use .str converion.

Please check the source code, which explains it well:

For calling .str.{method} on a Series or Index, it is necessary to first initialize the :class:StringMethods object, and then call the method.

>>> df = pd.DataFrame({'a': ['asdf']}, dtype="string")
>>> df
      a
0  asdf

>>> df.dtypes
a    string
dtype: object

>>> df["a"].str.replace("a", "b", regex=True)
0    bsdf
Name: a, dtype: string

>>> df = pd.DataFrame({'a': ['asdf']}, dtype="object")
>>> df.dtypes
a    object
dtype: object

dtype:

browned from @HYRY.

Look at here source of inspiration for below explanation

From pandas docs where All dtypes can now be converted to StringDtype

The dtype object comes from NumPy, it describes the type of element in a ndarray. Every element in an ndarray must have the same size in bytes. For int64 and float64, they are 8 bytes. But for strings, the length of the string is not fixed. So instead of saving the bytes of strings in the ndarray directly, Pandas uses an object ndarray, which saves pointers to objects; because of this the dtype of this kind ndarray is object.

Here is an example:

the int64 array contains 4 int64 value.
the object array contains 4 pointers to 3 string objects.

enter image description here

Note:

Object dtype have a much broader scope. They can not only include strings, but also any other data that Pandas doesn't understand.

thank you. Which type is better to use for string operations? — afternoon_drinker, Aug 26 '21 at 09:26

score 0 · Answer 2 · answered Aug 26 '21 at 08:15

0

For the string type you can do this:

df = pd.DataFrame({'a': ['asdf']}, dtype="string")
df["a"].str.replace("a","b")

answered Aug 26 '21 at 08:15

Babak Fi Foo

926
7
17

thank you. how to do in case of df["a"].replace({"a":"b"}, regex=False) – afternoon_drinker Aug 26 '21 at 08:37

dataframe string type cannot use replace method

2 Answers2

dtype:

Note: