2

I want to basically do this:

  f = open(genes_path, 'w')
  for key, genes in key_genes.iteritems():
      f.write(key)
      for gene in genes:
          f.write(",\t"+gene)
      f.write("\n")

  f.close()

And get this:

key1, AT3G32920, AT3G33187, AT3G32940, AT3G32930, AT3G32980, AT3G32960

key2, AT3G32920, AT3G33187, AT3G32940, AT3G32930,

Where the key can be any string (without a comma), order doesn't matter anywhere (I'm using the OrderedMultiDict from boltons and lists for printing convenience but it really doesn't matter, could be dict and set for all I care), and each row can have a different number of elements.

I can't seem to find any module that does this pretty simple task. DictWriter requires column/field names so this doesn't answer my problem. Numpy only works with rectangular arrays and padding introduces too much unnecessary stuff. I know it's easy to write the loop yourself, but I just feel like this is something common enough it would have it's own builtin.

For the times I need to just send people big lists of things (like genes to somebody who doesn't program) so they can pull it into excel add or remove elements then send it back and I don't have to do anything else.

Anyone know of a module that has functionality for automatically reading and writing these ragged dict-of-lists files? Or if there is a good reason for this not to exist?

I'm thinking something as dead-simple as pandas.read_csv(path, delimiter=",") and pandas.DataFrame.to_csv(path, delimiter=",").


Rationale

The reason I am being picky about it being a single function of a module and not something I could very easily do in pure python isn't because I'm lazy, but because when you use something from a module with good documentation it is a lot easier for someone to look at the code and figure out exactly what was intended. Even if the task is kind of trivial you're still reducing complexity of your code. I see writing your own function as something domain-specific, whereas a common read-write routine should be something you import and should preferably be used if available. Part of the zen of python right? So the second question really is asking "Is this a domain-specific task?", because it doesn't seem so to me.

Community
  • 1
  • 1
salotz
  • 429
  • 4
  • 20
  • 1
    Have you tried anything using the [`csv` module](https://docs.python.org/2/library/csv.html)? – Two-Bit Alchemist Aug 18 '15 at 20:42
  • This might be useful: http://stackoverflow.com/questions/13437727/python-write-to-excel-spreadsheet – sodiumnitrate Aug 18 '15 at 20:42
  • @Two-BitAlchemist that is where the DictWriter comes from so yes. – salotz Aug 19 '15 at 13:57
  • @sodiumnitrate I'm not trying to explicitly write to excel formats, I want csv/text because that is universal. – salotz Aug 19 '15 at 13:59
  • I mean, then what you're doing should work. I would write a function that does the printing, using Jake Griffin's answer below or your code, and call it whenever you need to print something. – sodiumnitrate Aug 19 '15 at 14:02
  • My code does work, I just feel like that I didn't have to write it. I clarified the final question in an edit. – salotz Aug 19 '15 at 14:11
  • Expanded my comment about the csv module into an answer. I'm not sure it's a vast improvement over the simple, working code you've written. (You should try this in C some time just to appreciate the simplicity of this task in Python.) Hopefully this is in line with what you're trying to do, and if you let me know if the output file is satisfactory, I can show you corresponding read code. – Two-Bit Alchemist Aug 19 '15 at 15:56
  • I understand the difficulty of this in other languages, which is why I love python so much. I guess I got extra spoiled with numpy and pandas read functions! – salotz Aug 19 '15 at 16:20

2 Answers2

1

You could build each line into a string and do a single write:

with open(genes_path, 'w') as f:
    for key, genes in key_genes.iteritems():
        f.write("\n".join(",\t".join([key] + genes)))

This is still doing it yourself, but it's more succinct than the code you have posted.

Jake Griffin
  • 2,014
  • 12
  • 15
0

Well for one thing, I don't see what's so bad about your original loop (which you could turn into a function, and shorten using the with context manager). However, I mentioned the csv module because it seems to do almost exactly what you require, no DictWriter required.

I'm assuming you're starting with something like this:

In [4]: key_genes
Out[4]: 
{'key1': ['AT3G32920',
  'AT3G33187',
  'AT3G32940',
  'AT3G32930',
  'AT3G32980',
  'AT3G32960'],
 'key2': ['AT3G32920', 'AT3G33187', 'AT3G32940', 'AT3G32930'],
 'key3': ['AT3G32920',
  'AT3G33187',
  'AT3G32940',
  'AT3G32930',
  'AT3G32980',
  'AT3G32960'],
 'key4': ['AT3G32920', 'AT3G33187', 'AT3G32940', 'AT3G32930']}

So this code:

with open('out.csv', 'wb') as outfile:
    writer = csv.writer(outfile)
    for key, genes in key_genes.iteritems():
        writer.writerow([key] + genes)

Produces this:

key3,AT3G32920,AT3G33187,AT3G32940,AT3G32930,AT3G32980,AT3G32960
key2,AT3G32920,AT3G33187,AT3G32940,AT3G32930
key1,AT3G32920,AT3G33187,AT3G32940,AT3G32930,AT3G32980,AT3G32960
key4,AT3G32920,AT3G33187,AT3G32940,AT3G32930

Obviously if you want the keys sorted you can do that your keys will be sorted, since you're using an ordered structure and I'm using a regular built-in dict. Now this is where we get into the almost part of your requirement. You're using ,\t as a delimiter. If you try to do this with csv.writer, it will complain that delimiters should be one character. This makes sense to me because csv files are normally comma-delimited, or tab-delimited, and not both. The delimiter is only there for ease of machine processing and the machine only needs one character (that doesn't occur unquoted anywhere else) to do it.

So my final answer is: if you can live with a one-character delimiter (and for normal CSV processing, this shouldn't be a problem), use the csv module. Otherwise, use your short loop.

Two-Bit Alchemist
  • 17,966
  • 6
  • 47
  • 82
  • This is good as well, but like you said it's not a vast improvement as far as verbosity. The delimiter doesn't/shouldn't really matter. I'll put a function signature of my idea in the question. – salotz Aug 19 '15 at 16:18