Pandas: quick glance at column values

Question

I have a huge dataset with 1000+ columns. Most of them contains *NaN's * or just a few values. Manual sifting through each column is an unreasonable waste of time. How can I do an estimate column diversity, top freq values, etc with a single command?

`pandas.DataFrame.describe()` is featured very early on in the introductory text of pandas' documentation: http://pandas.pydata.org/pandas-docs/stable/10min.html as is counting unique values: http://pandas.pydata.org/pandas-docs/stable/10min.html#histogramming — Paul H, Mar 09 '17 at 17:38
What do you mean by "few" values? Do you expect discrete repeated values or floats? — FLab, Mar 16 '17 at 16:53

score 0 · Answer 1 · edited May 23 '17 at 12:17

0

First, you need to get what single column contains, so you can make a for loop like that:

column = [array[i] for i in range(0,len(array), STEP]

where STEP = the number of columns in your file

Then you can do whatever you want with that. Answering to your questions, you can use i.e. max(column) - min(column), that will give you diversity. To get top common values I suggest you look there:

click

edited May 23 '17 at 12:17

Community

1
1

answered Mar 09 '17 at 18:01

Paweł Balawender

73
1
5

this is pretty inefficient compared to `dataframe.describe()` – Paul H Mar 15 '17 at 17:32

Pandas: quick glance at column values

1 Answers1