Column unexpectedly dropped during Pandas Dataframe Append

Question

Below is my code, which simply groups together and averages sets of rows. For the life of me, I can't understand why a column is dropped in the final result.

import pandas as pd

def group_rows(dataframe1):
    incr = 10
    dataframe3 = pd.DataFrame()
    for i in range(0,len(dataframe1.index),incr):
        tmp = dataframe1[i:i+incr].mean()
        dataframe3 = dataframe3.append(tmp, ignore_index=True)
    print dataframe3.to_string()

group_rows(pd.read_csv('sample.csv')) # Inputs the CSV file whose snapshot is shown below

The CSV file sample.csv is the input for the group_rows() function above, and consists of 12 columns, and many rows. The returned result from this function has 11 columns instead of 12.

A snapshot of the output is given below.

what is dataframe2 doing? could you provide the output of your function? we only see the expected output not what is really happening. — Quickbeam2k1, Mar 30 '17 at 08:01
I think `dataframe2` may be just dummy as its optional and not being used within the function. Please give it a read to frame question in pandas: http://stackoverflow.com/questions/20109391 — Gurupad Hegde, Mar 30 '17 at 08:18
Thanks for the feedback, I've edited the question to try and make the input and output clearer. I've also deleted redundant variables & lines like `dataframe2` — Hamman Samuel, Mar 30 '17 at 14:42
Your `K` column disappeared. Do you have anything that is not numerical in this row? `mean` implicitly drops such columns. Also - it'd be easier to debug if you gave your columns a name using `pd.read_csv('sample.csv', columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', I', 'J', 'K'])` — tmrlvi, Apr 13 '17 at 05:32
Thanks I tried to see if there's any non-numerical values using the code here but I'm not getting any non-numeric values: http://stackoverflow.com/a/21772078/863923 Is there another approach to finding whether there are non-numerical values? — Hamman Samuel, Apr 13 '17 at 09:52

score 0 · Accepted Answer · edited May 23 '17 at 12:34

@tmrlvi from the comments above fully deserves credit for this answer. The reason the columns were dropping was indeed due to non-numeric data being present in them. Below is a code snippet that identifies rows and columns with non-numeric data, adapted from another SO question.

def is_numeric(val):
    try:
        val = str(val)
        return float(val) and '.' in val or val.isdigit()
    except ValueError:
        return False

def non_numeric(dataframe, axis = 1):
    dataframe1 = dataframe1.applymap(is_numeric).all(axis)
    return [i for i in range(0, len(dataframe1)) if not dataframe1[i]]

Some notes from the code snippet. Setting axis to 1 returns the rows containing non-numeric data, while setting it to 0 returns the columns.

My suggested strategy is to drop the rows instead of Pandas' default of dropping columns on aggregation functions like mean(). To drop non-numeric rows, the following snippet is useful.

nonnum_rows = non_numeric(dataframe1)
dataframe1 = dataframe1.drop(dataframe1.index[nonnum_rows])

Column unexpectedly dropped during Pandas Dataframe Append

1 Answers1