My problem is I normally do not have that large data. Methods I have used with some success on as small data sets choke on this dataset, maybe my ram is not enough for it. I need to deal with a large dataset in MYSQL. The tables as below, contains data as large as 10gb. I need to do some a specific analysis on it.
I have two tables, table1 is as following, id is primary key, table2_id is foreign key
id date aid table2_id message
1 2014-03-19 16:21:02 121 1 login from xxxx
2 2014-03-20 14:00:32 123 2 decrease budget
3 2014-03-20 18:00:32 121 2 increase budget
4 2014-03-21 16:21:02 121 3 login from xxxx
5 2014-03-21 16:21:02 121 3 login from xxxx
6 2014-03-22 10:21:02 121 2 increase budget
table2 is a type table
id type
1 login
2 change_budget
3 search
I need to analyse the relationship between activity change_budget and other activity. This is waht I have tried so far:
I try to write SQL for each time one aid change_budget, count the all other actions of this aid in the preceding 24 hours. If one aid have a change_budget at 2014-03-14 16:21:02, I need all the actions performed on this aid from 2014-03-13 16:21:02 to 2014-03-14 16:21:02. Then, grouping by the action and count the result I want should look as follows.
aid login search change_budget
121 1 0 1
123 0 0 -1
121 0 2 1
-1 means change_budget is decrease, 1 means increase. Other number is how many time this action of this aid has happen within one day before this aid change_budget happens.
My problem is I normally do not have that large data. If I treat it in the same way as small data, my ram is not enough for it. So I need a big data solution. Any suggestion?
Note: I have edited this question in an attempt to explain the issue better, but it still will need review from the original author to verify it states the problem correctly.