I'm trying to read, merge and append a large amount of from a CSV. The basics are all working correctly. However, I am overwriting my result set and haven't been able to correct it.
Data in both files is very straightforward:
# Small data set
A,B,C
1,2,101
3,4,102
9,10,103
# Large data set(used in chunk below)
A,B,C
1,2,1000
3,4,2000
9,10,3000
Sample Code
import pandas as pd
# Read CSVs
inventory_1 = pd.read_csv("file1.csv")
# Create new DF to hold the merge results
bucket = pd.DataFrame(columns=list("ABC"))
# Use chunk to read in the large file, merge and append the data
for chunk in pd.read_csv("file2.csv",chunksize=2):
chunk_merge = pd.merge(
inventory_1, chunk,
left_on=['A'],
right_on=['A'],
how='left')
result = bucket.append(chunk_merge)
print(result)
What happens is the merge will work correctly on the data in the chunk, but the previous results are overwritten in the result. So, in the example above I get:
# 1st Loop
A B B_x B_y C C_x C_y
0 1 NaN 2.0 2.0 NaN 1000.0 101.0
1 3 NaN 4.0 4.0 NaN 2000.0 102.0
2 9 NaN 10.0 NaN NaN 3000.0 NaN
# 2nd Loop
A B B_x B_y C C_x C_y
0 1 NaN 2.0 NaN NaN 1000.0 NaN
1 3 NaN 4.0 NaN NaN 2000.0 NaN
2 9 NaN 10.0 10.0 NaN 3000.0 103.0
The answer I need is:
A B_x C_x B_y C_y
0 1 2 1000 2 101
1 3 4 2000 4 102
2 9 10 3000 10 103
I feel like the answer is staring me in the face but I can't see it. Any help would be appreciated.