The way I was thinking is regular expression
data = re.sub('[^0-9a-zA-Z\\s\\.\\,]', '', string=html).lower()
data = re.sub('<[^>]*>', '', string=html)
data = re.sub('[^ ㄱ-ㅣ가-힣]+', '', string=html)
However, the number may not be visible and the space may be too long.
I would appreciate any recommendations if there is a better way.