Just to condense the fruitful comment stack into a self-contained answer.
if-elif
slightly condensed and converted to function
import os
import pandas as pd
def read_any(file):
if file.endswith('.csv', 'tsv') :
df = pd.read_csv(file)
elif file.endswith('.json'):
df = pd.read_json(file)
elif file.endswith('.xml'):
df = pd.read_xml(file)
elif file.endswith('.xls','xlsx'):
df = pd.read_excel(file)
elif file.endswith('.hdf'):
df = pd.read_hdf(file)
elif file.endswith('.sql'):
df = pd.read_sql(file)
else:
raise ValueError(f'Unsupported filetype: {file}')
return df
if __name__ == '__main__':
# here or wherever it is used
file_path = input("enter the path to the file you want to open")
df = read_any(file_path)
print(df.head(100))
Hacky (?) dictionary for functions
Now to the possible refactoring of the if-elif
stack into a dictionary lookup. I don't find it hacky, but it's not really necessary. It's more of a stylistic choice:
import os.path
READER_MAP = {
'xlsx': pd.read_excel,
'xls': pd.read_excel,
'xml': pd.read_xml, # .. and so on
'sql': my_read_sql # special case
}
def my_read_sql(con_w_sql, sep='#'):
"""Made-up function that reads SQL from CON in a single call
Args:
con_w_sql: connection string + query or table name,
joined by separator string sep
sep: separator
Example:
>>> my_read_sql('postgres:///db_name#table_name')
"""
con, sql = special_path.split(sep)
return pd.read_sql(sql, con)
And with all that, read_any
would shorten to:
def read_any(file):
_, ext = os.path.splitext(file)
try:
reader = READER_MAP[ext]
except KeyError:
raise ValueError(f'Unsupported filetype: {ext}')
return reader(file)
I don't really like that one has to make up a non-standard convention to concatenate sql
(or table names) with con
(connection string). I would keep separate user-facing functions for file-like and DB-like reading.
Conclusion
Ok, now having written that out, I would recommend sticking to the if-elif
solution. It handles special cases with much less fuzz, and can be refactored to call my_read_sql
-type special handlers with the same line count as READER_MAP
, but without that variable.
Also: the .endswith
stack is more flexible with respect to double extensions (e.g. .csv.gz
), which would be trickier to get right with the READER_MAP
approach.