Can I read B.csv and write to A.csv in new columns without read A.csv?

Question

A(csv file) record stock price which crawl from a stock-website everyday.

Stock_num     Stock_name       20201201
0001           appl             201.3
0002           goog            1001.5

B(csv file) newest data crawl from same stock-website

Stock_num     Stock_name       20201202
0001           appl               208.3 
0002           goog               999.8

I want to use pandas to write B into A in below format(without read A).

Stock_num     Stock_name       20201201    20201202
0001           appl             201.3         208.3 
0002           goog            1001.5         999.8

I had try below: B.to_csv('A.csv',index=False,mode='a', encoding='utf_8_sig')

But what I got is:

Stock_num     Stock_name       20201201
0001           appl               201.3
0002           goog              1001.5
Stock_num     Stock_name       20201202
0001           appl               208.3 
0002           goog               999.8

I know I can read A.csv and concat with b.csv then output. But is there a way I can write B.csv into A.csv without read A.csv?

Yes, use `A = pd.read_csv('A.csv')` and `B = pd.read_csv('B.csv')`. Then, `A = A.merge(B)`. — Mayank Porwal, Dec 02 '20 at 06:17
@MayankPorwal I don't think that is his question. For example, what if the A file is 10GB and the B file is 10GB and the user only has 16GB on his computer. Reading between the lines, I think he is trying to append to an existing file, so that he doesn't have to read it in. If memory is the issue, I would recommend using `dask`. You can not append with `to_csv` like you can with `to_sql` or something. Also, you can only append row-wise with `to_sql`. You might want to try out the `csv` module rather than using pandas, but I am not sure as that could just create other problems. — David Erickson, Dec 02 '20 at 06:21
@David Erickson Ok I will try use CSV module first. Thanks a lot. — RyanWan, Dec 02 '20 at 06:22

David Erickson · Accepted Answer · 2020-12-02T06:51:16.360

1

See if dask helps it run faster. You are still reading in both files, but it could be hundreds of times faster/more memory efficient with dask:

import dask.dataframe as dd
A = dd.read_csv('Desktop/A.csv')
B = dd.read_csv('Desktop/B.csv')
df = A.merge(B)
df = df.compute()
df.to_csv('C.csv', index=False)

edited Dec 02 '20 at 06:51

answered Dec 02 '20 at 06:43

David Erickson

16,433
2
19
35

Can I read B.csv and write to A.csv in new columns without read A.csv?

1 Answers1