I need to process a pretty huge .css (at least 10 millions rows, hundred of columns) with Python. I'd like:
- To filter the content based on several criteria (mostly strings, maybe some regular expressions)
- To consolidate the filtered data. For instance, grouping them by date, and for each date counting occurences based on a specific criterium. Pretty similar to what a pivot table could do.
- I'd like to have an user-friendly access to that consolidated data
- I'd like to generate charts (mostly basic line charts)
- Processing must be fast AND light, because computers at work cannot handle much and we're always in a hurry
Given these prerequisites, could you please suggest some ideas? I thought about using pandas. I also thought about dumping the csv into a SQLite database (because it may be easier to query if I code an User Interface). But it is really my first foray into this world, so I don't know where to start. I don't have much time, but I'll would be very glad if you could offer some pieces of advice, some good (and fresh) things to read etc, interesting libs and so forth. Sorry if Stackoverflow is not the best place to ask for this kind of help. I'll delete the post if needed. Regards.