0

I've been assigned a task which involves merging the pd.read_csv() and the pd.read_excel() functions together into one function called ingest(). I've been trying to use regular expressions so that if the file contains a ".csv" it will execute the read_csv() function or else it will read it as an excel file.

This is my code so far

    rexf = re.compile((r'.csv'))
    mo = rexf.search(dataframe)
    if mo == True:
        df = pd.read_csv(dataframe)
    else:
        df = pd.read_excel(dataframe)
    return df

I then call this function with a file called "Smoking.csv". This file works when I use the pd.read_csv() command but here it goes berserk and gives me this error

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found

Does anyone know why this may be, and how I can get the function to behave as intended? Thanks.

  • Check what `mo` is. It is either a `re.Match` object, or `None`, neither of those will ever be `== True`, so you will never use `pd.read_csv` ad it will always use `pd.read_excel` – juanpa.arrivillaga Oct 14 '20 at 10:53

2 Answers2

1

I would avoid using regex for this. It will work, but if you're dealing with paths you should use a tool for handling paths - like pathlib:

from pathlib import Path

import pandas as pd


def ingest(filename):
    path = Path(filename)
    if path.suffix == ".csv":
        df = pd.read_csv(dataframe)
    else:
        df = pd.read_excel(dataframe)
    return df

The reason your regex doesn't work is because of the if statement. A match object does not equal True. You could instead do if mo: which would work. But again... pathlib!

PirateNinjas
  • 1,908
  • 1
  • 16
  • 21
1

A function like this one should do it:

def ingest(file_name):
   if file_name.endswith('.csv'):
      df = pd.read_csv(file_name)
   else:
      df = pd.read_excel(file_name)
   return df