Vector Calculations in Pandas

Question

I have CSV file with Vector3 values exported from a C# program. I would like to use vector operations (like calculating the distance etc.) in pandas.
As far as I have seen, there is no Vector3 type in pandas. np.array offers this kind of operations but it is not available in pandas. What is the easiest way to accomplish vector calculations in a dataframe like data structure?
I would appreciate a detailed description starting with how to import the records from the CSV file as a vector type and ending with a calculation example.
The csv file has the following format:

aBin, bBin1, bBin2, bBin3, bBin4, ...
1, "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)", "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)" ...
2, "(-5.6848290, 0.0000000, 2.7744440)", "(0.6555564, 0.0000000, 7.2209800)", "(-3.6818280, 0.0000000, 2.5663330)", "(0.6445564, 0.0000000, 2.9509810)" ...
...

Edit
This CSV contains measurements of a program. There is a similar CSV (same shape) of another program and I want to calculate the distances between those two CSV (e.g. the distance between the value of [aBin1][bBin1] of the first CSV with [aBin1][bBin1] of the second CSV). Finally I want to sum this distances to a single value.

There are two closing parentheses in columns `bBin2` and `bBin4` is that a typo? You can start with `pd.read_csv("your_file.csv", skipinitialspace = True, index_col="aBin")` — Tranbi, Oct 17 '22 at 06:44
What kind of calculation are you trying to achieve? Your question needs more focus — Tranbi, Oct 17 '22 at 06:45
Is is right to calculate the distance between 1st row and 2nd row? — Lazyer, Oct 17 '22 at 07:04
Thanks for your comments! Yes the two parentheses were typos. I have updated the description with more details about the needed calculation. — Ling, Oct 17 '22 at 08:45

score 1 · Accepted Answer · answered Oct 17 '22 at 09:37

# vector1.txt
aBin, bBin1, bBin2
1, "(-1.6831280, 0.0000000, 2.4093440)", "(0.9445564, 0.0000000, 1.9509810)"
2, "(-5.6848290, 0.0000000, 2.7744440)", "(0.6555564, 0.0000000, 7.2209800)"

# vector2.txt
aBin, bBin1, bBin2
1, "(-1.6831280, 1.0000000, 2.4093440)", "(0.9445564, 2.0000000, 1.9509810)"
2, "(-5.6848290, 3.0000000, 2.7744440)", "(0.6555564, 4.0000000, 7.2209800)"

First, I loaded two files with file_to_dataframe function.

import numpy as np
import pandas as pd


def file_to_dataframe(fpath):
    # Function to change the format of file -> DataFrame
    # You can skip it if you can load the file as DataFrame
    with open(fpath, "r") as f:
        columns = f.readline().rstrip().split(', ')[1:]
        df = pd.DataFrame(columns=columns)
        for line in f:
            row = [x.replace('"', '') for x in line.rstrip().split(', "')[1:]]
            df = df.append(pd.Series(row, index=columns), ignore_index=True)
    return df.applymap(lambda x: np.array(eval(x)))

# Read file
df1 = file_to_dataframe('data/vector1.txt')
df2 = file_to_dataframe('data/vector2.txt')

>>df1
                        bBin1                       bBin2
0  [-1.683128, 0.0, 2.409344]  [0.9445564, 0.0, 1.950981]
1  [-5.684829, 0.0, 2.774444]   [0.6555564, 0.0, 7.22098]
>>df2
                        bBin1                       bBin2
0  [-1.683128, 1.0, 2.409344]  [0.9445564, 2.0, 1.950981]
1  [-5.684829, 3.0, 2.774444]   [0.6555564, 4.0, 7.22098]

And I got dist with np.linalg.norm function with flatten data from dataframe. and I made DataFrame with the result.

def dist(x, y):
    # https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy
    return np.linalg.norm(x-y)


new_vals = [dist(x, y) for x, y in zip(df1.values.flat, df2.values.flat)]
df_dist = pd.DataFrame(np.array(new_vals).reshape(df1.shape), columns=df1.columns, )

>>df_dist
   bBin1  bBin2
0    1.0    2.0
1    3.0    4.0

score 1 · Answer 2 · answered Oct 17 '22 at 11:17

You can take advantage of the pandas functions read_csv and applymap:

import pandas as pd
import numpy as np
import ast

df1 = pd.read_csv(your_file_1, skipinitialspace = True, index_col="aBin").applymap(lambda x: np.array(ast.literal_eval(x)))
df2 = pd.read_csv(your_file_2, skipinitialspace = True, index_col="aBin").applymap(lambda x: np.array(ast.literal_eval(x)))

df_out = (df1-df2).applymap(np.linalg.norm)
print(df_out)

Note: ast.literal_eval converts your tuple-strings to actual tuples and is safer as eval

df_out.sum() will give you the sum for each column and df_out.sum().sum() the total sum.

Vector Calculations in Pandas

2 Answers2