I have a .csv file which I want to open and ultimately save it as a pandas dataframe. This file has some junk text above the data frame per se, whose header starts at the string Sample_ID
. I wrote a code which does the job in multiple steps, and I am now wondering if there's a more elegant way to do so. Here's my code
import pandas as pd
import re
from io import StringIO
with open('SampleSheet.csv') as f:
## read in the .csv file as a string
step1 = f.read()
## subset the step1 file
# define where my df should start
start = 'Sample_ID'
step2 = step1[step1.index(start):]
## read in step2 as a pandas dataframe with stringio
step3 = pd.read_csv(StringIO(step2))
I was wondering if there's a way to slice directly with f.read()
, such that I would have one step less already.
I also tried to use pd.read_csv()
with skiprows
, but I am having a hard time in assigning the row number which starts with Sample_ID