0

I have date wise stock csv files like below.

EQ070717.CSV
EQ070716.CSV
EQ070715.CSV
[...]

They have stock data in this format:

SC_NAME,OPEN,HIGH,LOW,CLOSE
ABB,1457.70,1469.95,1443.80,1452.90,
AEGI,189.00,193.00,187.40,188.70
HDFC,1650.00,1650.00,1617.05,1629.20
[...]

How can i convert them to stock specific csv files which can be loaded as pandas datafframe. I could do it in .net, but just wanted to know if there is any straightforward way available in python/pandas.


Edit: Adding expected output

Create individual stock files based on stock name:

ABB.csv
AEGI.csv
HDFC.csv

For each stock pull in stock data from all files and add to that stock csv: For example stock ABB, read stock data from each date wise csv, and add that info to a new line in csv ABB.csv. Date value can be picked from file name or file modified date property also.

DATE, OPEN,HIGH,LOW,CLOSE
070717, 1457.70,1469.95,1443.80,1452.90
070716, 1456.70,1461.95,1441.80,1450.90
070715, 1455.70,1456.95,1441.80,1449.90
SarkarG
  • 687
  • 1
  • 8
  • 30

3 Answers3

1

I think you need glob for select all files, create list of DataFrames dfs in list comprehension and then use concat for one big DataFrame from all CSVs:

import glob

files = glob.glob('files/*.CSV')
dfs = [pd.read_csv(fp) for fp in files]
df = pd.concat(dfs, ignore_index=True)

If necessary filenames in output DataFrame:

files = glob.glob('files/*.CSV')
dfs = [pd.read_csv(fp) for fp in files]
#win solution - double split
keys = [x.split('\\')[-1].split('.')[0] for x in files]
df = pd.concat(dfs, keys=keys)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

adding to @jezrael solution. as user wants each stock specific csv file.

for stock_name in df.SC_NAME.unique():
    df[df['SC_NAME']==stock_name].to_csv(path_to_dir+stock_name+'.csv')
0

My approach would be to set up a sqlite database with a single table. Just three columns, market_date, symbol, and csv_string (maybe a col for line# in the file if you want relative positions preserved). Read all the files and load the data into the table line by line. Create an index on the table on the symbol column. Then create a cursor for read symbol, csv_string from stock_table order by symbol, market_date. Use itertools.groupby to let you know when you have looped over all of a symbol so you can close the last file and open the next.

Of course if you have little enough data that it can all fit into memory you just insert tuples into a list, sort the list and use groupby to loop over it to make your files.

verisimilidude
  • 475
  • 6
  • 9