4

I'm building an application which requires analysis of tabular data.

I would like to perform some columnar operations, such as the ability to rename columns, delete columns and calculate a new column based on the values of existing columns.

My first choice would have been something like Pandas, however one constraint is that this project must be cross-platform and very easy to deploy in a virtualenv. Pandas (on W32) appears to make use of binary installers that are not easy to deal with.

My second choice would be to roll my own table-class, but I'd hope that this will not become necessary.

So are there any alternatives?

UPDATE1: Anaconda, yes this is an excellent package, however I'm not free to choose my own platform. The platform has been chosen for me: It's Vanilla CPython 2.7.3 32bit. None of the servers have a C++ compiler. There is a cost of introducing any new non-python dependencies, as I'd have to ensure that any developer who uses this has those components, hence keeping things pure python will be valuable.

UPDATE2: What do I mean by tabular data? Informally, it's the kind of data you might represent in a spreadsheet or table in a SQL database.

In this case, they are structured data organized into rows and columns. Each column will be of a consistent type but can be None. Each column will have a string name. The columns have an order.

Salim Fadhley
  • 6,975
  • 14
  • 46
  • 83
  • 2
    with [Anaconda](https://store.continuum.io/cshop/anaconda/) all cross platform problems should be gone. As far as I know, there is no library that comes even close to pandas. If you use tables you will feel extremly sorry for not choosing pandas... it's just a matter of time – Retozi Apr 11 '14 at 15:21
  • Good comment: I've added an update to explain why this is not an option. – Salim Fadhley Apr 11 '14 at 15:47
  • What do you mean by tabular data? Did you take a look at https://docs.python.org/2/library/array.html ? – user189 Apr 11 '14 at 15:49
  • I've added update 2 to clarify this matter. – Salim Fadhley Apr 11 '14 at 15:58
  • @Retozi although one issue is that Anaconda may have license implications for use in servers (it's non-free). Saying that, the restriction to vanilla python seems masochistic. – Andy Hayden Apr 11 '14 at 16:12
  • @Andy: afaik, you're wrong: _Completely free - including for commercial use and even redistribution_, is the second bullet on my link. – Retozi Apr 11 '14 at 16:22
  • @Salim: Man I feel for you... I guess virtual machines and things like docker.io are out of the question as well..? Anyways. If there is really no other way, if i were you I'd implement a subset of pandas with the same api in python only based on arrays and dicts (if you need indexing)... it sucks but I have never seen anything like it in pure python. – Retozi Apr 11 '14 at 16:29
  • 1
    @Retozi http://continuum.io/anaconda-server I'm very pro anaconda, but was unclear on the distinction between commercial use and professional use. If you're right then that's great! – Andy Hayden Apr 12 '14 at 00:45

1 Answers1

2

ToyTable is a pure python table class.

It's not as fast as Pandas but much easier to install. It's BSD licensed so suitable for commercial use.

Salim Fadhley
  • 6,975
  • 14
  • 46
  • 83
  • Your project is very interesting, though it seems to lack a read_csv, pivot_table and groupby. What I really want for myself is a near feature-complete pandas, that you can use on aws lambda. I think read_csv with type-sniffing is the killer feature of pandas - that makes it so easy to use. – Andy Hayden Oct 08 '16 at 19:58
  • I think I did add a Pivot - it's been years since I last worked on it. It's not a hard feature to add by the way. New version is https://pypi.python.org/pypi/eztable/ – Salim Fadhley Oct 10 '16 at 22:33