I have a dataframe with a few columns in Japanese. I want to pad those column values to match the expected column length.
Dataframe:
StringData = StringIO(
"""agency_code,Name
亜草 太郎32,パンダーサン
亜草 太郎3223,2
"""
)
df_orig_data = pd.read_csv(StringData, sep=",")
Expected column length is 15 for both the columns. Now when I do this:
print(
df_orig_data["agency_code"]
.astype(str)
.str.pad(width=15, side="right", fillchar="0")
)
I get:
0 亜草 太郎3200000000
1 亜草 太郎3223000000
Actually it treats the double byte characters as single character and pads zeroes.
Actually what I need is:
0 亜草 太郎320000
1 亜草 太郎322300
亜草 太郎32 - 11 chars (4 double bytes and 3 single byte) + 4 zeroes = 亜草 太郎320000
亜草 太郎3223 - 13 chars (4 double bytes and 5 single byte) + 2 zeroes = 亜草 太郎322300
Issue:
I am not sure how to treat these Japanese characters along with the normal Alphabets/Numbers while padding the values.