While preparing the data (text file) for preprocessing. I am not able to split the text file into words.
import io
f = io.open("pg5200.txt", mode="r", encoding="utf-8")
text = f.read()
f.close()
import re
words = re.split(r'\W+', text)
print(words[:100])
After using the above code: The problem is I am getting an extra blank space (" ") in the beginning.
May I know why this extra space is occurring and how can I remove it??
Thank You