im new to python. I am trying to troubleshoot an error
I have a dataframe(reprex)-
import pandas as pd
df
Out[29]:
Id ServiceSubCodeKey PrintDate
0 1895650 2 2018-07-27
1 1895650 4 2018-08-13
2 1896355 2 2018-08-10
3 1897675 9 2018-08-13
4 1897843 2 2018-08-10
5 2178737 3 2019-06-14
6 2178737 4 2019-06-14
7 2178737 7 2019-06-14
8 2178737 1 2019-06-14
9 2178750 699 2019-06-14
columns = (
pd.get_dummies(df["ServiceSubCodeKey"])
.reindex(range(df.ServiceSubCodeKey.min(),
df.ServiceSubCodeKey.max()+1), axis=1, fill_value=0)
# now it has all digits
.astype(str)
)
codes = pd.Series(
[int(''.join(row)) for row in columns.itertuples(index=False)],
index=df.index)
codes = (
codes.groupby(df.Id).transform('sum').astype('str')
.str.pad(width=columns.shape[1], fillchar='0')
.str.rstrip('0') # this will remove trailing 0's
)
print(codes)
df = df.assign(one_hot_ssc=codes)
OverflowError: int too large to convert to float
When i tried to troubleshoot it, this error occurs at the part
codes = pd.Series(
[int(''.join(row)) for row in columns.itertuples(index=False)],
index=df.index)
If i change the last service subcode to 60 or a lower number instead of 699, this error goes away. Any solution to this error? I want it to work even for a 5 digit number. Lookin for a permanent solution