0

I have a similar question to this one ...

Merge error : negative length vectors are not allowed

However, I am merging two files (3 columns each, 1300 million rows each) by one column and have a similar error:

Negative length vectors are not allowed.

The suggested answer to this is that there isn't enough memory, however, I'm running these on a system with 3TB memory (of which it tells me the maximum reached was 247 GB). Is this still likely to be due to memory issues, or is there something else at play? Would it just be worth reducing these dataframes and merging them?

Thanks for any advice.

Best wishes, Natalie

  • 1
    Is the column an exact match? Are there duplicates? Do you have an idea of the number of records after the merge? R will only report a memory error for the last vector it tried to allocate, not the total memory used by R so the error message rarely gives the whole picture. It's really impossible to say for sure what's going on without a reproducible example which would be difficult to create in this case. – MrFlick Aug 30 '17 at 14:50
  • 1
    There are other similar questions that suggest it could be due to duplicates being referred in `by` during `merge`. https://stackoverflow.com/q/42479854/8382207 – Sagar Aug 30 '17 at 14:50
  • Duplicate values in the ID could cause the size of your merged data.frame to explode. Consider IDs duplicated 4 times in both datasets. The ultimate data.frame could then be 16 times the number of rows compared to the originals. – lmo Aug 30 '17 at 14:55
  • @NatalieStephenson - I didn't check the question you shared in your post. Its the same I have in my comment above. My bad. – Sagar Aug 30 '17 at 14:56
  • @lmo thank you for your suggestions ... my comment was too large, so I've added it as an answer. – NatalieStephenson Aug 31 '17 at 09:03
  • Given your example, maybe you want to merge on Sample and Component? These variables together appear to compose a unique ID, at least in the data that you presented. – lmo Aug 31 '17 at 11:25

1 Answers1

-1

@lmo @Sagar @MrFlick There are duplicates in the column that I'm merging by. Each sample, has alterations in multiple different components so would look something like

Sample    Component    Value
a                    x                  -1
a                    y                   1
b                    x                   0 ... 

I'm adding in survival data to this that refers solely to the sample, so I'd ideally like to have

Sample    Component    Value     Survival
a                    x                  -1           0.8
a                    y                   1           0.8
b                    x                   0           0.2

I thought merge would offer this ... am I wrong in assuming this? Will adding the survival data to this using merge by the sample name cause data entries to further duplicate? I'm not sure how best to try to perform this.

I'll try to test with a shortened datafile to see if it (a) works and (b) expands the data. Any other suggestions?

  • Please include this as part of your original post. People get upset with this type of content is included as an answer. – lmo Aug 31 '17 at 11:20