1

I know some basics in c++, but I am a beginner in python.

I have a piece of working code (see below) and I'd like to add a constraint for formatting its output, and I cannot figure out how to do it...

Let me first explain what the program does:

I have an input file colors.csv that contain a list of colors, one color a line: the colors are defined by their name and colorimetric coordinates X, Y and Z, it looks so:

Colorname, X1, Y1, Z1
Colorname2, X2, Y2, Z2
...etc.

Given any list of XYZ coordinates, contained in another input file targets.csv the program will give me a list of solutions in an output file output.txt

This solution is calculated by first triangulation of the points cloud with tetgen and then barycentric coordinates of the point in a tetrahedron, (but it doesn't matters to explain everything here...)

The solution has the form:

target, name0, density0, name1, density1, name2, density2, name3, density3

There are always only 4 names and associated densities.

It will look for example like this:

122 ,PINKwA,0.202566115168,GB,0.718785775317,PINK,0.0647284446787,TUwA,0.0139196648363

123 ,PINKwA,0.200786239192,GB,0.723766147717,PINK,0.0673550497794,TUwA,0.00809256331169

124 ,PINKwA,0.19900636349,GB,0.72874651935,PINK,0.0699816544755,TUwA,0.00226546268446

125 ,OR0A,0.00155317194109,PINK,0.0716160265958,PINKwA,0.195962072115,GB,0.730868729348

126 ,OR0A,0.00409427478508,PINK,0.0726192660009,PINKwA,0.192113520109,GB,0.731172939105

127 ,OR0A,0.00663537762906,PINK,0.073622505406,PINKwA,0.188264968103,GB,0.731477148862

What I would like to do now?

For practical reasons, I would like my output to follow a certain order. I would like a "priority list" to rule the order of the name, density output.

My actual program output the color names in an order that I don't understand, but anyway I need these color names to be in a specific order, for example PINK should always be the first PINKwA the second, etc.

Instead of:

127 ,OR0A,0.00663537762906,PINK,0.073622505406,PINKwA,0.188264968103,GB,0.731477148862

I want;

127 ,PINK,0.073622505406,PINKwA,0.188264968103,OR0A,0.00663537762906,GB,0.731477148862

Because my priority list says:

0, PINK
1, PINKwA
2, OR0A
3, GB

How could I simply add this function to the code below? Any idea?

EDITED CODE (works...):

import tetgen, geometry
from pprint import pprint
import random, csv
import numpy as np
from pprint import pprint

all_colors = [(name, float(X), float(Y), float(Z))
              for name, X, Y, Z in csv.reader(open('colors.csv'))]

priority_list = {name: int(i)
                 for i, name in csv.reader(open('priority.csv'))}

# background is marked SUPPORT
support_i = [i for i, color in enumerate(all_colors) if color[0] == 'SUPPORT']
if len(support_i)>0:
    support = np.array(all_colors[support_i[0]][1:])
    del all_colors[support_i[0]]
else:
    support = None

tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors])
colors = [all_colors[i] for i in hull_i]

print ("thrown out: "
       + ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))

targets = [(name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

for target in targets:
    name, X, Y, Z, BG = target
    target_point = support + (np.array([X,Y,Z]) - support)/(1-BG)
    tet_i, bcoords = geometry.containing_tet(tg, target_point)

    output = open('output.txt','a')

    if tet_i == None:
        output.write(str(target[0]))
        output.write('\n')


    else:
        names = [colors[i][0] for i in tg.tets[tet_i]]
        sorted_indices = sorted(enumerate(names), key=lambda (i, name): priority_list[name])
        output.write(target[0])
        counting = 0

        for i, name in sorted(enumerate(names), key=lambda (i, name): priority_list[name]):
            output.write(',%s,%s' % (name, bcoords[i]))
            counting = counting + 1

            if counting > 3:
                output.write('\n')
                counting = 0

output.close()
adrienlucca.net
  • 677
  • 2
  • 10
  • 26

1 Answers1

1

First, you'll need to encode your priority list directly in your Python code :

priority_list = {
    'PINK': 0,
    'PINKwA': 1,
    'OR0A': 2,
    'GB': 3,
}

This will let you quickly retrieve the order for a given color name. Then, you can use the key argument to sorted to sort your names by their priority. Critically, though, you need to retrieve not the sorted names but the indices of the sorted names, much like http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html.

sorted_indices = sorted(enumerate(names), key=lambda (i, name): priority_list[name])

The enumerate builtin annotates each name with its index in the original list of names, and then the sorted builtin sorts the resulting (i, name) pairs based on their rank in the priority list. Then we can write the names out to the file, followed by the corresponding element (using the index value) from the bcoords array.

for i, name in sorted_indices:
    output.write(',%s,%s' % (name, bcoords[i]))

So, here's what I'd make the final block in your code look like :

names = [colors[i][0] for i in tg.tets[tet_i]]
output.write(target[0])
for i, name in sorted(enumerate(names), key=lambda (i, name): priority_list[name]):
    output.write(',%s,%s' % (name, bcoords[i]))
output.write('\r\n')
output.close()

Here I changed your file output strategy to be a bit more Pythonic -- in general, adding strings together is largely not done, it's better instead to create a format string and fill in variables (you can also use .format() on the string to do this). Also, you can make multiple calls to .write() and they will simply continue to write bytes to the file, so no need to create a big long string all at once to write out. Finally, no need to call str on '\r\n' as it's already a string.

lmjohns3
  • 7,422
  • 5
  • 36
  • 56
  • Thanks, that's great! To use a .csv file as my `priority_list` can I use: `priority_list = csv.reader(open('priority.csv'))` ?? Where should I insert this line? – adrienlucca.net Aug 03 '13 at 19:32
  • 1
    It depends on the structure of your .csv file. If it's of the same form as you listed above, then I'd convert it to a dictionary using `priority_list = {name: int(i) for i, name in csv.reader(open('priority.csv'))}`. – lmjohns3 Aug 03 '13 at 19:53
  • Hi, something's wrong with me: I updated the code but I should have done a mistake. I edited the code above and added the message error at the end: `TypeError: () takes exactly 2 arguments (1 given)` – adrienlucca.net Aug 03 '13 at 20:12
  • 1
    Ah, that's my mistake -- needed parentheses in the sorting key. I'll fix up the code. – lmjohns3 Aug 03 '13 at 20:20
  • Thank you, another bug's showing up: it works but the output looks like mess. Each line starts with `,` , the dictionary numbers are sneaking in between the values and names, and the 1st `colorname, density` is repeated at the end of each line... Any ideas? Thanks so much! – adrienlucca.net Aug 03 '13 at 20:44
  • If you could just check one more time my edit it would be great ;) – adrienlucca.net Aug 03 '13 at 23:44
  • 1
    Yes, it doesn't look like you've quite integrated the code. Try removing the last 8 lines from your edited code. Then add a line with `else:` indented 4 spaces. Then add the code that I suggested, indented by 8 spaces (i.e. inside the else block). – lmjohns3 Aug 04 '13 at 23:05
  • Thank you so much! I reedited the code (had to add a function to control the `/n` – adrienlucca.net Aug 05 '13 at 12:22