0

The problem is how do you load rows from a pandas dataframe to a numpy array for line by line processing? While there are many questions on a similar issue, this issue is unique in that it requires line-by-line processing, which I have facilitated with a for loop. The for loop intends to take each row in the dataframe as a numpy array and multiply it by another numpy array with arbitrary floating point values. The minimum function is defined below.

def function():
    #Load Data
    data = pd.read_csv('data.csv')
    #Forward
    for row in data:
        variable_matrix = np.array([[header_0, header_1], [header_2, header_3]])
        weight_matrix = np.array([[0.01, 0.01], [0.01, 0.01]])
        output = np.matmul(variable_matrix, weight_matrix)
        print(output)

The output error that is returning is as follows.

    variable_matrix = np.array([[header_0, header_1], [header_2, header_3]])
NameError: name 'header_0' is not defined

Intuitively, the array would take in the value associated with header_0 in the first row in this instance. However, the machine is unable to recognize this value, despite the fact that it is defined in the header of the pandas dataframe, which has been loaded as a datafile from data.csv.

Any thoughts or suggestions would be greatly appreciated. Thank you.

BSH180_44
  • 35
  • 1
  • 7
  • 2
    There are quite a few issues with this code. When iterating over a dataframe directly you'll only get column names. `for row in data` is more accurately `for column_name in data`. `header_0` is not defined anywhere nor are any of the other "header" variables you've used. I'd suggest starting with [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/q/16476924/15497888) on how to access row values, but especially the answers about how to avoid iteration. – Henry Ecker Sep 08 '21 at 01:23
  • Thanks @HenryEcker I sincerely appreciate the thoughtful response. I am going to review the answers from the [Question](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) you posted and the for loop logic you provided. To me it makes sense that I would iterate over the columns instead of the rows, but then I may need to transpose my dataset for processing. Either way, as you mentioned there are quite a few issues with this code. I will report back when I have made some meaningful progress on this. With great thanks - Brian Haney – BSH180_44 Sep 08 '21 at 04:52

1 Answers1

1

For iterating over rows you need to use the .iterrows() method:

data = pd.read_csv('data.csv')
    #Forward
for index, row in data.iterrows():
    ...

If you want to load the dataframe as a numpy array, then you need to use the .values attribute:

data = pd.read_csv('data.csv')
    #Forward
for row in data.values:
    ...

The page that Henry Ecker suggested gives the detailed answer to your question:

How to iterate over rows in a DataFrame in Pandas

Babak Fi Foo
  • 926
  • 7
  • 17
  • Thanks for the feedback, I sincerely appreciate your time. – BSH180_44 Sep 08 '21 at 22:01
  • The difference between the question you posted, [link](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) and what my question poses, is that the data needs to be loaded from the '.csv' file. I reviewed the answer in depth, tried both scripts you suggested, but I am still getting errors. For example, AttributeError: 'numpy.ndarray' object has no attribute 'iterrows'. I will keep working on this and post an answer when I find it. Thanks! – BSH180_44 Sep 08 '21 at 22:03
  • You can get and input as `pandas.DataFrame` for your function, or the path of CSV. The important thing in your case is that you need to use `.iterrows()` and `.apply()` for pandas dataframes. If your data type is numpy array, then a for loop will do the job, although it might not be the most efficient. If you are not sure what is the data type of your variable `data` you can simply check by: `print(type(data))`. It will print your data type and see if it is pandas dataframe or a numpy. – Babak Fi Foo Sep 09 '21 at 05:11