9

I am trying to merge two excel files using the following code and encountering the error of ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

import pandas as pd

file1 = pd.read_excel("file1.xlsx")
file2 = pd.read_excel("file2.xlsx")

file3 = file1.merge(file2, on="Input E-mail", how="outer")

file3.to_excel("merged1.xlsx")

File size is ~100MB+~100MB, Available Ram is 9GB (of 16GB)

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
Nivas
  • 91
  • 1
  • 1
  • 2

1 Answers1

1

Your resulting dataframe can be much larger than your two input ones. Simple example:

import pandas as pd

values = pd.DataFrame({"id": [1,1,1,1], "value": ["a", "b", "c", "d"]})

users = pd.DataFrame({"id": [1,1,1], "users": ["Amy", "Bob", "Dan"]})

big_table = pd.merge(users, values, how="outer")

print big_table

Result:

     id  users    value
0     1   Amy       a
1     1   Amy       b
2     1   Amy       c
3     1   Amy       d
4     1   Bob       a
5     1   Bob       b
6     1   Bob       c
7     1   Bob       d
8     1   Dan       a
9     1   Dan       b
10    1   Dan       c
11    1   Dan       d
Akavall
  • 82,592
  • 51
  • 207
  • 251
  • Thanks for the response. This is intended and I am looking for the Result exactly what you described but since the file size is 100MB it is giving an error of "array is too big". Can you help me to solve this? – Nivas Apr 04 '17 at 19:35
  • @user3783174 If you need the whole merged dataframe, the only way to solve this issue is to get machine with more RAM. However, do you need it all? Maybe you can filter the rows that you are most interested in before merging? – Akavall Apr 04 '17 at 19:40
  • Unfortunately Yes. I need all the data. Can you suggest me the required RAM? – Nivas Apr 04 '17 at 19:45
  • I don't know. It would really depend on how your input files are structured. And even if I new that, it would be pretty hard to figure out. – Akavall Apr 04 '17 at 19:50
  • I just tried the same in 64 bit and got an error as "MemoryError". Could you suggest an alternate way to deal this? – Nivas Apr 04 '17 at 19:54
  • No promises, but if you are on a unix system, you could try to merge the table in command line: http://stackoverflow.com/questions/13300271/merge-join-two-tables-fast-linux-command-line. – Akavall Apr 04 '17 at 19:57
  • However, I would rethink if you need a table that big. Even if you can create it, how will it be used? – Akavall Apr 04 '17 at 19:59