1

I am trying to import a lower triangular matrix into pandas (or numpy) in Python in order to convert it into a symmetrical square matrix but I am stuck at the import step.

The matrix has the first column with names but the rest of the columns are numbers in a tab delimited format like this: A B 1 C 10 20 D 21 25 45

I get error when I try to import it using numpy: myfile = np.genfromtxt("trimatrix.txt", delimiter="\t") and also when I use pandas: myfile = pd.read_table("trimatrix.txt") In both cases, the errors are because there is only one column after reading the first row but the other rows have more columns.

Thank you for your time and help!

Ravi Kumar
  • 21
  • 2

2 Answers2

1

The straightforward answer is that you don't simply import the triangle: its shape is, by definition, incompatible with the rectangular format required by the built-in readers of NumPy and PANDAS. You have to write your own code to read the input file, stuff the values into the corresponding rows of your chosen data structure, and fill the missing columns with your default value: zeroes, I assume. This needs only a simple loop.

Given that what you want is not a built-in method, can you handle the coding from here?

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Thank you. Was just curious if there was an easy solution available through numpy or pandas. Yes I will try and figure it out on how to code it. I can probably add some placeholders (like NA) and then remove them later. – Ravi Kumar Apr 01 '20 at 05:45
  • Look into common methods for imputing incomplete or missing data, often used to handle partial records in machine learning. – Prune Apr 01 '20 at 06:31
1

I found a workaround using awk and pandas:

  1. First reverse the triangular matrix with awk as posted here

  2. Open in pandas and reverse the order of dataframe. Since the top row has the maximum number of columns, it can be opened and reversed.

Thank you Prune for guidance.

example AWK code: awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' lowertri.txt >uppertri.txt

And in python/pandas: import pandas as pd myfile = pd.read_table("uppertri.txt", header=None, index_col=0) myrevfile = myfile.iloc[::-1]

Ravi Kumar
  • 21
  • 2