0

I'm working on analyzing a .h5ad file, which has the file type AnnData.

I've created separate lists based on some clustering program, and named the lists according to their cluster number (i.e. x1, x2, x3, x4 ...)

Now, I would like to take the mean of all the separate rows in each list. Of course this could be easily done by making many loops, but I thought it would be interesting to try and do it in in a single loop.

The code to do this for a single list is as follows:

means1 = []
for q in range(0, len(x1.var)):
    means1.append(np.mean(x1.X[:, q2])

Now, I would like to be able to substitute means1 and x1 with variable numbers. For means1 this can be solved by making it a dict and using a second for with range(0, number) as follows:

x = {}
for q1 in range(0, 20):
    for q2 in range(0, len(x1.var)):
        x['mean' + q1] = np.mean(x1.X[:,q2])

But because the variable that I use in x1 already exists, it is impossible to just use string formatting like 'x' + q1, since a str doesn't have the attribute .X.

Is there any way to do this, or should I accept that it's impossible?

Ralf
  • 16,086
  • 4
  • 44
  • 68
HarryMuesli
  • 43
  • 1
  • 4

3 Answers3

2

I've created separate lists based on some clustering program, and named the lists according to their cluster number (i.e. x1, x2, x3, x4 ..)

Most often when you find yourself using such a naming scheme, you really want a list or dict instead.

bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
1

First idea: you could iterate over all of your lists in an outer loop and apply your second idea to that. Then create a subdict for every list inside x. With this, you would have 3 Loops for everything instead of one for every single list:

x = {}
list_number = 1
for list in x1, x2, x3, x4:
    for q1 in range(0, 20):
        for q2 in range(0, len(list.var)):
            x['x{}'.format(list_number)]['mean' + str(q1)] = np.mean(list.X[:,q2])
    list_number += 1

We could also substitute one for loop with a dict comprehension (which does not really take a loop away, but shortens the code):

x['x{}'.format(list_number)] = {'mean'+str(q1): np.mean(list.X[:,q2]) for q2 in range(0, len(list.var))}

That being said, while i don't know exactly how your data is structured, having a dict of the form

lists = {'x1': [the_list], 'x2': [other_list], ...}

is always better for this type of task. Since there is no really good way to get the name of a variable, having them stored in a dict as string keys makes it way easier to work with them. This enables you to do something like this:

means = {name: {'mean'+ str(q + 1): np.mean(lists[name].X[:,q]) for q in range(len(lists[name].var))} for name in lists}

which will return a dictionary of the form

means = {'x1': {'mean1': mean_1, 'mean2': mean_2, ...}, 'x2': {'mean1': mean_1,...}...}

Doing all of this with a single loop is impossible, at least with how your data is structured right now, because you have to iterate over at least two iterables:

  1. all the lists;

  2. all elements of each lists variables.

Flob
  • 898
  • 1
  • 5
  • 14
0

A simple solution would be to build a list with all the x variables and then iterate over it.

Maybe something like this:

x_list = [x1, x2, x3, x4]
means = {}

for i, x in enumerate(x_list):
    for j in range(len(x.var)):
        key = (i, j)
        means[key] = np.mean(x.X[:, j])

Does this work for you?

Ralf
  • 16,086
  • 4
  • 44
  • 68