1

Below is my code, which simply groups together and averages sets of rows. For the life of me, I can't understand why a column is dropped in the final result.

import pandas as pd

def group_rows(dataframe1):
    incr = 10
    dataframe3 = pd.DataFrame()
    for i in range(0,len(dataframe1.index),incr):
        tmp = dataframe1[i:i+incr].mean()
        dataframe3 = dataframe3.append(tmp, ignore_index=True)
    print dataframe3.to_string()

group_rows(pd.read_csv('sample.csv')) # Inputs the CSV file whose snapshot is shown below

The CSV file sample.csv is the input for the group_rows() function above, and consists of 12 columns, and many rows. The returned result from this function has 11 columns instead of 12.

channels.csv snapshot

A snapshot of the output is given below.

Output from function

Hamman Samuel
  • 2,350
  • 4
  • 30
  • 41
  • what is dataframe2 doing? could you provide the output of your function? we only see the expected output not what is really happening. – Quickbeam2k1 Mar 30 '17 at 08:01
  • I think `dataframe2` may be just dummy as its optional and not being used within the function. Please give it a read to frame question in pandas: http://stackoverflow.com/questions/20109391 – Gurupad Hegde Mar 30 '17 at 08:18
  • Thanks for the feedback, I've edited the question to try and make the input and output clearer. I've also deleted redundant variables & lines like `dataframe2` – Hamman Samuel Mar 30 '17 at 14:42
  • 1
    Your `K` column disappeared. Do you have anything that is not numerical in this row? `mean` implicitly drops such columns. Also - it'd be easier to debug if you gave your columns a name using `pd.read_csv('sample.csv', columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', I', 'J', 'K'])` – tmrlvi Apr 13 '17 at 05:32
  • Thanks I tried to see if there's any non-numerical values using the code here but I'm not getting any non-numeric values: http://stackoverflow.com/a/21772078/863923 Is there another approach to finding whether there are non-numerical values? – Hamman Samuel Apr 13 '17 at 09:52

1 Answers1

0

@tmrlvi from the comments above fully deserves credit for this answer. The reason the columns were dropping was indeed due to non-numeric data being present in them. Below is a code snippet that identifies rows and columns with non-numeric data, adapted from another SO question.

def is_numeric(val):
    try:
        val = str(val)
        return float(val) and '.' in val or val.isdigit()
    except ValueError:
        return False

def non_numeric(dataframe, axis = 1):
    dataframe1 = dataframe1.applymap(is_numeric).all(axis)
    return [i for i in range(0, len(dataframe1)) if not dataframe1[i]]

Some notes from the code snippet. Setting axis to 1 returns the rows containing non-numeric data, while setting it to 0 returns the columns.

My suggested strategy is to drop the rows instead of Pandas' default of dropping columns on aggregation functions like mean(). To drop non-numeric rows, the following snippet is useful.

nonnum_rows = non_numeric(dataframe1)
dataframe1 = dataframe1.drop(dataframe1.index[nonnum_rows])
Community
  • 1
  • 1
Hamman Samuel
  • 2,350
  • 4
  • 30
  • 41