I have OHLC data in a .csv file with the stock name is repeated in the header rows, like this:
M6A=F, M6A=F,M6A=F, M6A=F, M6A=F
Open, High, Low, Close, Volume
I am using pandas read_csv to get it, and parse all (and only) the 'M6A=F' columns to FastAPI. So far nothing I do will get all the columns. I either get the first column if I filter with "usecols=" or the last column if I filter with "names=".
I don't want to load the entire .csv file then dump unwanted data due to speed of use, so need to filter before extracting the data.
Here is my code example:
symbol = ['M6A=F']
df = pd.read_csv('myOHCLVdata.csv', skipinitialspace=True, usecols=lambda x: x in symbol)
def parse_csv(df):
res = df.to_json(orient="records")
parsed = json.loads(res)
return parsed
@app.get("/test")
def historic():
return parse_csv(df)
What I have done so far:
I checked the documentation for pandas.read_csv
and it says "names=" will not allow duplicates.
I use lambdas
in the above code to prevent the symbol hanging FastAPI if it does not match a column.
My understanding from other stackoverflow questions on this is that mangle_dupe_cols=True
should be incrementing the duplicates with M6A=F.1, M6A=F.2, M6A=F.3 etc...
when pandas reads it into a dataframe, but that isn't happening and I tried setting it to false, but it says it is not implemented yet.
Other answers I found in this stackoverflow solution don't seem to tally with what is happening in my code, since I am only getting the first column returned, or the last column with the others over-written. (I included FastAPI code here as it might be related to the issue or a workaround).