2

I have two data frames, df and df_test. I am trying to create a new dataframe for each df_test row that will include the difference between x coordinates and the y coordinates. I wold also like to create a new column that gives the magnitude of this distance between objects. Below is my code.

import pandas as pd
import numpy as np


# Create Dataframe
index_numbers = np.linspace(0, 10, 11, dtype=np.int)
index_ = ['OP_%s' % number for number in index_numbers]
header = ['X', 'Y', 'D']
# print(index_)

data = np.round_(np.random.uniform(low=0, high=10, size=(len(index_), 3)), decimals=0)
# print(data)

df = pd.DataFrame(data=data, index=index_, columns=header)
df_test = df.sample(3)
# print(df)
# print(df_test)

for index, row in df_test.iterrows():
    print(index)
    print(row)
    df_(index) = df
    df_(index)['X'] = df['X'] - df_test['X'][row]
    df_(index)['Y'] = df['Y'] - df_test['Y'][row]
    df_(index)['Dist'] = np.sqrt(df_(index)['X']**2 + df_(index)['Y']**2)
    print(df_(index))

Better For Loop

for index, row in df_test.iterrows():
    # print(index)
    # print(row)
    # print("df_{0}".format(index))
    df_temp = df.copy()
    df_temp['X'] = df_temp['X'] - df_test['X'][index]
    df_temp['Y'] = df_temp['Y'] - df_test['Y'][index]
    df_temp['Dist'] = np.sqrt(df_temp['X']**2 + df_temp['Y']**2)
    print(df_temp)

I have written a for loop to run through each row of the df_test dataframe and "try" to create the columns. The (index) in each loop is the name of the new data frame based on test row used. Once the dataframe is created with the modified and new columns I would need to save the data frames to a dictionary. The new loop produces the each of the new dataframes I need but what is the best way to save each new dataframe? Any help in creating these columns would be greatly appreciated.

Please comment with any questions so that I can make it easier to understand, if need be.

Moose Drool
  • 115
  • 2
  • 11
  • Can not find any info about df_T1 – BENY Dec 18 '17 at 20:11
  • Sorry about that, T1 is what I was using before (index). I have switched it. Good catch. – Moose Drool Dec 18 '17 at 20:15
  • You may have some luck reading this: https://stackoverflow.com/questions/13603215/using-a-loop-in-python-to-name-variables Perhaps related, why is `df_test` a sample of `df`? This method creates index errors in pandas, when I try to run your code (and assign new `df`s to a dictionary). – Evan Dec 18 '17 at 22:49
  • df_test is a sample of df because I want to find the distances between all the objects(each row is an object with a x and y distance from some coordinate origin). I need to find the distances from each respective object in df_test to any object in the df dataframe. I'm open to other ways of doing this though. I am trying to go about it the only way I can think of right now. – Moose Drool Dec 18 '17 at 23:02
  • 1
    So, if you have a DF of 10 coordinates, you want to get back a DF (or dict, or list) with 90 coordinates? e.g., index (dict key) is your original x-y, and your values are the distances to the other 9 coordinates? – Evan Dec 19 '17 at 00:16
  • Well, there are only 3 test objects so in this case there would only be 3 dataframes returned with 10 coordinates each; one of which is the test object at (0,0) now. What needs to happen now is each dataframe needs to be stored in a dictionary to be called upon at a later time. That is where I was trying to dynamically name each new dataframe created. – Moose Drool Dec 19 '17 at 00:22

0 Answers0