2

I have some code that's a little sloppy/repetitive where I want to generate a new dictionary after determining whether values from a list of lists match a "master list", returning either true or false. I want these new true/false lists appended to a dictionary using labels from yet another list. As such, each of the new lists (a-d below) will have the same number of values (either True/False for each) which will then be used as a dataframe index in generating an upset plot. The example code is as follows:

a = []
b = []
c = []
d = []
for motif in unique_motifs:
    if motif in motif_lists[0]:
        a.append('True')
    else:
        a.append('False')               
for motif in unique_motifs:
    if motif in motif_lists[1]:
        b.append('True')
    else:
        b.append('False')   
for motif in unique_motifs:
    if motif in motif_lists[2]:
        c.append('True')
    else:
        c.append('False')       
for motif in unique_motifs:
    if motif in motif_lists[3]:
        d.append('True')
    else:
        d.append('False')

data_dictionary = {'motif_key': unique_motifs, \
                   args.plot_labels[0]: a, \
                   args.plot_labels[1]: b, \
                   args.plot_labels[2]: c, \
                   args.plot_labels[3]: d}

and here are some example values for each of the lists in motif_lists and the master list unique_motifs:

unique_motifs = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
motif_lists[0] = ['B', 'C']
motif_lists[1] = ['A']
motif_lists[2] = ['B', 'C', 'E', 'G']
motif_lists[3] = ['D', 'F']

and the above code would then create new lists as follows:

a = ['False', 'True', 'True', 'False', 'False', 'False', 'False']
b = ['True', 'False', 'False', 'False', 'False', 'False', 'False']
c = ['False', 'True', 'True', 'False', 'True', 'False', 'True']
d = ['False', 'False', 'False', 'True', 'False', 'True', 'False']

which would then get appended to the dictionary. Each of the plot_labels values is a string and will be used as a unique identifier. I'd really like to condense this code AND as a BONUS, I'd like this dictionary/number of list generated/for loops to be expandable according to len(motif_lists) (which is determined by the number of file provided by the user input). There is already a check in place to make sure len(motif_lists) == len(args.plot_labels. For example, if len(motif_lists) == 7, I would end up with lists a, b, c, d, e, f, and g such as above. I imagine there's a way to do this with something like:

for n, val in enumerate(motif_lists):
    globals()["list%d"%n] = []

and then I would just set a limit on user input values so it doesn't get out of hand...

smac89
  • 39,374
  • 15
  • 132
  • 179
Margaret Gruca
  • 190
  • 1
  • 16
  • 3
    I must point out, but don't ever do this: `globals()["list%d"%n] = []`, **don't use dynamic variables**. Use a *container*, like a `list` or a `dict`. Anyway, this might be a better fit for [codereview.se] – juanpa.arrivillaga Nov 27 '19 at 00:32
  • 1
    Welcome to StackOverflow. [On topic](https://stackoverflow.com/help/on-topic), [how to ask](https://stackoverflow.com/help/how-to-ask), and ... [the perfect question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) apply here. StackOverflow is a knowledge base for *specific* programming problems -- not a design, coding, research, or tutorial resource. You're asking for a wholesale refactoring of your code, which is somewhat beyond the scope of Stack Overflow, as it involves tutelage of several programming techniques you can readily learn elsewhere. – Prune Nov 27 '19 at 00:33
  • 1
    @ juanpa.arrivillaga Was just using an example to address my need -- I don't plan to use dynamic variables in the code. – Margaret Gruca Nov 27 '19 at 00:33
  • 2
    @MargaretGruca right, but you are sort of contradicting that saying " I would end up with lists a, b, c, d, e, f, and g such as above." I'm saying you *don't need dynamic variables*. Seriously, it's a *very* common issue. Stepping away from being a novice programmer is to stop thinking in terms of "I need this and this *variable*" to "I need this or that data structure" – juanpa.arrivillaga Nov 27 '19 at 00:36
  • @ juanpa.arrivillaga Fair enough -- appreciate the comment and suggestion for Code Review – Margaret Gruca Nov 27 '19 at 00:44

1 Answers1

3

You can replace the repetitive loops with:

for bin, motif_list in zip([a, b, c, d], motif_lists):
    for motif in unique_motifs:
        if motif in motif_list:
            bin.append('True')
        else:
            bin.append('False')

As mentioned in the comments, the above can further be reduced to:

for bin, motif_list in zip([a, b, c, d], motif_lists):
    for motif in unique_motifs:
        bin.append(motif in motif_list)

In fact you can continue in this manner until you reduce the above to a list comprehension, however readability might suffer. It's up to you.


For the other part you wanted to condense, you can do something like:

data_dictionary = {'motif_key': unique_motifs, **{args.plot_labels[i]: bins[i] for i in range(len(args.plot_labels))}}

Where bins is a list that contains your lists a, b, etc. So something like:

bins = [[] for _ in range(len(motif_lists))]

The way I merged the dictionary above is only available from python 3.5 onwards. See this answer for more info.

smac89
  • 39,374
  • 15
  • 132
  • 179
  • 1
    I think you have your zip parameters backwards. – Sumner Evans Nov 27 '19 at 00:36
  • 3
    You can simplify this even further; the entire `if` statement can be replaced by a call to `str`: `bin.append(str(motif in motif_list))`. – chepner Nov 27 '19 at 00:47
  • @chepner, python is just great. I will add it and leave it to OP to decide. I was striving more for readability in this answer – smac89 Nov 27 '19 at 00:48
  • Then you can get rid of the inner `for` loop: `bin.extend(str(motif in motif_list) for motif in motif_list)`. – chepner Nov 27 '19 at 00:48