1

A(csv file) record stock price which crawl from a stock-website everyday.

Stock_num     Stock_name       20201201
0001           appl             201.3
0002           goog            1001.5

B(csv file) newest data crawl from same stock-website

Stock_num     Stock_name       20201202
0001           appl               208.3 
0002           goog               999.8

I want to use pandas to write B into A in below format(without read A).

Stock_num     Stock_name       20201201    20201202
0001           appl             201.3         208.3 
0002           goog            1001.5         999.8

I had try below: B.to_csv('A.csv',index=False,mode='a', encoding='utf_8_sig')

But what I got is:

Stock_num     Stock_name       20201201
0001           appl               201.3
0002           goog              1001.5
Stock_num     Stock_name       20201202
0001           appl               208.3 
0002           goog               999.8

I know I can read A.csv and concat with b.csv then output. But is there a way I can write B.csv into A.csv without read A.csv?

David Erickson
  • 16,433
  • 2
  • 19
  • 35
RyanWan
  • 13
  • 3
  • Use `A = A.merge(B)`. – Mayank Porwal Dec 02 '20 at 06:09
  • In this way, shouldn't I read A.csv into pandas first? – RyanWan Dec 02 '20 at 06:16
  • Yes, use `A = pd.read_csv('A.csv')` and `B = pd.read_csv('B.csv')`. Then, `A = A.merge(B)`. – Mayank Porwal Dec 02 '20 at 06:17
  • @MayankPorwal I don't think that is his question. For example, what if the A file is 10GB and the B file is 10GB and the user only has 16GB on his computer. Reading between the lines, I think he is trying to append to an existing file, so that he doesn't have to read it in. If memory is the issue, I would recommend using `dask`. You can not append with `to_csv` like you can with `to_sql` or something. Also, you can only append row-wise with `to_sql`. You might want to try out the `csv` module rather than using pandas, but I am not sure as that could just create other problems. – David Erickson Dec 02 '20 at 06:21
  • @David Erickson Ok I will try use CSV module first. Thanks a lot. – RyanWan Dec 02 '20 at 06:22
  • 1
    @DavidErickson Reopened the question. – Mayank Porwal Dec 02 '20 at 06:23

1 Answers1

1

See if dask helps it run faster. You are still reading in both files, but it could be hundreds of times faster/more memory efficient with dask:

import dask.dataframe as dd
A = dd.read_csv('Desktop/A.csv')
B = dd.read_csv('Desktop/B.csv')
df = A.merge(B)
df = df.compute()
df.to_csv('C.csv', index=False)
David Erickson
  • 16,433
  • 2
  • 19
  • 35