python pandas - how to convert date wise stock csv to stock wise csv files

Question

I have date wise stock csv files like below.

EQ070717.CSV
EQ070716.CSV
EQ070715.CSV
[...]

They have stock data in this format:

SC_NAME,OPEN,HIGH,LOW,CLOSE
ABB,1457.70,1469.95,1443.80,1452.90,
AEGI,189.00,193.00,187.40,188.70
HDFC,1650.00,1650.00,1617.05,1629.20
[...]

How can i convert them to stock specific csv files which can be loaded as pandas datafframe. I could do it in .net, but just wanted to know if there is any straightforward way available in python/pandas.

Edit: Adding expected output

Create individual stock files based on stock name:

ABB.csv
AEGI.csv
HDFC.csv

For each stock pull in stock data from all files and add to that stock csv: For example stock ABB, read stock data from each date wise csv, and add that info to a new line in csv ABB.csv. Date value can be picked from file name or file modified date property also.

DATE, OPEN,HIGH,LOW,CLOSE
070717, 1457.70,1469.95,1443.80,1452.90
070716, 1456.70,1461.95,1441.80,1450.90
070715, 1455.70,1456.95,1441.80,1449.90

Do u need to load all file at once, and add a column for date when combine all csv file, or just read the csv file? — Phung Duy Phong, Jul 11 '17 at 04:39
For splitting the dataframe you might want to refer to this [question](https://stackoverflow.com/questions/19790790/splitting-dataframe-into-multiple-dataframes/43998102#43998102). If you consider my answer, you can use the dictionary keys as filename prefixes — Quickbeam2k1, Jul 11 '17 at 06:27

jezrael · Answer 1 · 2017-07-11T04:55:50.720

1

I think you need glob for select all files, create list of DataFrames dfs in list comprehension and then use concat for one big DataFrame from all CSVs:

import glob

files = glob.glob('files/*.CSV')
dfs = [pd.read_csv(fp) for fp in files]
df = pd.concat(dfs, ignore_index=True)

If necessary filenames in output DataFrame:

files = glob.glob('files/*.CSV')
dfs = [pd.read_csv(fp) for fp in files]
#win solution - double split
keys = [x.split('\\')[-1].split('.')[0] for x in files]
df = pd.concat(dfs, keys=keys)

edited Jul 11 '17 at 04:55

answered Jul 11 '17 at 04:39

jezrael

822,522
95
1,334
1,252

1

I prefer `pathlib.Path` for path operations. Then you can use `fp.name` – Maarten Fabré Jul 11 '17 at 08:59

score 1 · Answer 2 · answered Jul 11 '17 at 05:15

1

adding to @jezrael solution. as user wants each stock specific csv file.

for stock_name in df.SC_NAME.unique():
    df[df['SC_NAME']==stock_name].to_csv(path_to_dir+stock_name+'.csv')

answered Jul 11 '17 at 05:15

Achyuta nanda sahoo

455
3
15

score 0 · Answer 3 · answered Jul 11 '17 at 05:37

My approach would be to set up a sqlite database with a single table. Just three columns, market_date, symbol, and csv_string (maybe a col for line# in the file if you want relative positions preserved). Read all the files and load the data into the table line by line. Create an index on the table on the symbol column. Then create a cursor for read symbol, csv_string from stock_table order by symbol, market_date. Use itertools.groupby to let you know when you have looped over all of a symbol so you can close the last file and open the next.

Of course if you have little enough data that it can all fit into memory you just insert tuples into a list, sort the list and use groupby to loop over it to make your files.

python pandas - how to convert date wise stock csv to stock wise csv files

3 Answers3