I am trying to get a list of files and directories present in a specified URL. The URL I'm using is of an online Dictionary : www.shabdkosh.com/kn/browse/. My code is as follows:
html_files = []
for root, dirs, files in os.walk("www.shabdkosh.com/kn/browse"):
for file in files:
#Files in shabdkosh have a digit as name to represent page number
if file.isdigit():
html_files.append(os.path.join(root, file))
when I print the contents of files, I get:
www.shabdkosh.com/kn/browse/3/1
www.shabdkosh.com/kn/browse/a/1
www.shabdkosh.com/kn/browse/a/10
www.shabdkosh.com/kn/browse/a/2
...
This is cool. But other URLs should have also been retrieved. The URLs containing Kannada alphabets are not displayed (Kannada is an Indian Language) even though they exist.
For example,
www.shabdkosh.com/kn/browse/ಅ/
Like so are not displayed even though they lie in the path "www.shabdkosh.com/kn/browse" specified as the parameter for os.walk
. So, how do I get os.walk
to get the list of URLs with the Kannada letters ?
I even tried including the following code at the top of my python file:
#!/usr/bin/env python
# -*- coding: ascii -*-
But no luck. Any help is appreciated.
P.S Sorry if it bothers you that I'm using Old python 2.7.