How do I create a computed column in Python?

Question

I am new to Python and I am trying to replicate functionality that I am quite used to in SAS. I want to create a new variable (data column) that contains the result of a computation using existing variables (data column) for that same row (record). And I want this new variable to be part of the existing dataset. After much research, I can't find anything on this specific topic. The dataset originates from a CSV file that contains two columns of numerical data, and the row size is not knowable a priori. I can perform the calculations I need without any issues, but trying to expand the dataset to have a third column in which I can place the results is where I'm getting stuck.

import numpy as np

import pandas as pd

driver1_1_data = pd.read_csv(...)

for i in range(len(driver1_1_data.values[:,0])):
    MPS = np.sqrt((driver1_1_data.values[i,0]-driver1_1_data.values[i-1,0])**2+(driver1_1_data.values[i,1]-driver1_1_data.values[i-1,1])**2)

Show us a few rows of the CSV file, and the formula to compute the third column — inspectorG4dget, Jan 23 '15 at 18:59
You might take a look at http://stackoverflow.com/questions/12376863/adding-calculated-columns-to-a-dataframe-in-pandas — sfjac, Jan 23 '15 at 19:00
There are many possible different solutions, depending on what you've already done... If you show us your program, the part where you read the data file and prepare for computation, you will get better answers. — gboffi, Jan 23 '15 at 19:28
The "MPS" is just a placeholder since I haven't gotten that definition to work yet. — someguy, Jan 23 '15 at 19:34
aus_lacy: thanks for the post edit! I'm also new to stackoverflow. — someguy, Jan 23 '15 at 20:15

alacy · Answer 1 · 2015-01-26T16:28:19.740

1

You can use pandas.DataFrame.apply() functionality if you want to calculate over the values of a specific feature (column).

For example you could do:

driver1_1_data['New Calculated Col'] = driver1_1_data.apply(lambda: row: np.sqrt(row['col1']*row['col2']...))

This code creates a new column appropriately named New Calculated Col and populates it with the calculations you specified in the apply(lambda...). Obviously you would adjust what is done within the lambda according to your needs, but I think this will get you headed in the right direction.

edited Jan 26 '15 at 16:28

answered Jan 23 '15 at 20:39

alacy

4,972
8
30
47

Thank you, aus_lacy! This is a huge help! – someguy Jan 26 '15 at 15:07
@someguy if my solution helped you solve your initial question then normal Stack Overflow process is to accept the answer with the checkmark under the up-vote so that future users of SO who may have a similar question to yours can quickly locate a working solution. Also, a good way to show appreciation for help from fellow SO users is to up-vote their answers/users. – alacy Jan 26 '15 at 16:26
I tried to up-vote you, but the system won't allow me. I need 15 reputation, and I just started using the site. As for this fixing my problem, it has not. It seemed like it would. But after implementing the recommended fix, I received errors. After researching the errors, it seems I can't take this approach. – someguy Jan 27 '15 at 17:17
I'm trying to take the Euclidean distance between the x,y values on line i and line i-1. So, the most straight-forward approach for doing this seemed to be a loop. I simply can't get the solution you provided to work in the loop yet. Either way, I appreciate your help. – someguy Jan 27 '15 at 17:20
@someguy if you post the errors to your original question I might be able to help you. – alacy Jan 27 '15 at 17:20

How do I create a computed column in Python?

1 Answers1