I need to use the aggregate function on a 18gb dataset consisting in numerical and categorical dataset in CSV format (with more than 60 million records in some cases).
I have tried various packages like ff or bigmemory but with no success. The problem is that I have to group data by the values of some columns applying a given user defined function on one column as aggregate function makes or on several columns as split function does.
A short example of this:
country day month year f person_id age...
1 23 01 2014 4005 5000 20...
1 23 01 20014 4005 244 43...
....
grouping by country and month we want to know the number of passengers as aggregate does on data.frame or data.table (no large datasets are supported) or grouping by age and sex apply a analysis over contry day month and day as split function can do on data.frame or data.table (so no large datsets).
Can you folks let me know a solution to this? Please any hints can be helpful. Thanks a lot for collaboration!