The pandas.read_csv
function is very flexible and recently has begun to support URL inputs as described here
df = pd.read_csv('http://www.somefile.csv')
I am attempting to find in the source code where this case is handled. Here is what I know so far:
1) read_csv
is a fairly generic wrapper generated by _make_parser_function
within io/parsers.py
2) The function produced by _make_parser_function
delegates the reading of the data to a function _read(filepath_or_buffer, kwds)
that is defined elsewhere in io/parsers.py
3) This function _read(filepath_or_buffer, kwds)
creates a TextFileReader
and returns the result of TextFileReader.read()
. However, it appears that TextFileReader
is responsible for, well, only text files. It provides functionality for handling various types of compression, but I see nothing checking for URL input.
4) On the other hand,io/html.py
contains a function _read(obj)
that clearly is to access a URL and return the result of an http query.
It seems to me that the simple solution to this problem is to check if the input string is a URL, and if so dispatch to the html
module; however, I cannot find where this happens when tracing through read_csv
. Could anybody point me in the right direction?