1

how do I remove the name of sub-field, u letter and brackets in CSV when export data to csv from Mongodb?

the sample of workflow collection:

 {
        "_id": ObjectID("54b98bca7c1f57d85021f308"),
        "requiredLanguages": [
            "en"
        ],
        "isDeleted": false,
        "name": {
            "en": "Test Workflow"
        },        
        "isEditable": false,
        "date": {
            "created": ISODate("2016-07-21T23:06:38.748Z")
        }
    }

if I put 'date.created' in python script, it shows everything which I do not want.

the python 2.7 script:

import csv
cursor = db.workflows.find( {}, {'_id': 1, 'requiredLanguages': 1, 'isDeleted': 1, 'name': 1, 'isEditable':1, 'date.created':1 })
with open ('workflows.csv', 'w') as outfile:
    fields = ['_id', 'requiredLanguages', 'isDeleted', 'name', 'isEditable', 'date']
    write = csv.DictWriter(outfile, fieldnames=fields)
    write.writeheader()
    for x in cursor:        
        write.writerow(x)

the csv sample:

_id,requiredLanguages,isDeleted,name,isEditable,date
5318cbd9a377f52a6a0f671f,[u'en'],False,{u'en': u'Default Workflow'},False,"{u'created': datetime.datetime(2016, 7, 21, 23, 6, 38, 748000)}"
54b98bca7c1f57d85021f308,[u'en'],False,{u'en': u'Test Workflow'},False,"{u'created': datetime.datetime(2016, 7, 21, 23, 6, 38, 748000)}"

if I change 'date' to different name, it did not work. I want to get the value under 'date.created'. also how to remove u letter? someone can help me out about removing u letter and the name of "date.created". Thanks so much

user7070824
  • 133
  • 1
  • 2
  • 11
  • You have a list in the `requiredLanguages` field and a `dict` in `name`. You need to figure out how do you want those to appear in the `.csv` file and then _"flatten"_ those fields as well, so they appear as a string before calling `write.writerow(x)` For instance, for the `list` (the `requiredLanguages` field) you could take a look to this: http://stackoverflow.com/questions/5618878/how-to-convert-list-to-string – Savir Oct 26 '16 at 16:55
  • Python 2.x I assume? Can you add an additional tag? – tdelaney Oct 26 '16 at 16:56
  • I want to get the values appearing the CSV, it is like _id,requiredLanguages,isDeleted,name,isEditable,date 5318cbd9a377f52a6a0f671f,en,False,'Default Workflow,False,2016, 7, 21, 23, 6, 38, 748000 I do not need brackets, quotations, "en", "{u'created': datetime.datetime" that appear in the CSV file. – user7070824 Oct 26 '16 at 17:10

1 Answers1

3

Your first problem is that you need to decide how to handle multi-valued fields. There can be several required languages, so how are they represented in your csv? My example solution makes a comma separated list for lang and selects a primary language for name, but you may want to model this differently. DictWriter is a convenience method and I replaced it with custom code that builds a row tuple itself.

The second problem is that u string is unicode. You need to decide how you want to represent unicode data in the csv. I implemented a solution that encodes the csv in utf-8.

You'll notice that my mock data set left out several details like IOODate that you may still need to juggle.

import csv
import codecs

# Mock for debug
#cursor = db.workflows.find( {}, {'_id': 1, 'requiredLanguages': 1, 'isDeleted': 1, 'name': 1, #'isEditable':1, 'date.created':1 })

cursor = [{
        "_id": "54b98bca7c1f57d85021f308",
        "requiredLanguages": [
            "en"
        ],
        "isDeleted": False,
        "name": {
            "en": "Test Workflow"
        },        
        "isEditable": False,
        "date": {
            "created": "2016-07-21T23:06:38.748Z"
        }
}]


with codecs.open('workflows.csv', 'w', encoding='utf-8') as outfile:
    fields = ['_id', 'requiredLanguages', 'isDeleted', 'name', 'isEditable', 'date']
    write = csv.writer(outfile)
    write.writerow(fields)
    for x in cursor:
        primary_lang = x["requiredLanguages"][0]
        write.writerow((x["_id"], u','.join(x["requiredLanguages"]),
            x["isDeleted"], x["name"][primary_lang], x["isEditable"],
            x["date"]["created"]))
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • the script worked. how do I change the cursor to my own cursor? I got the result which is good. see the following csv : _id,requiredLanguages,isDeleted,name,isEditable,date 54b98bca7c1f57d85021f308,en,False,Test Workflow,False,2016-07-21T23:06:38.748Z that is what I want. I tried to change the cursor, bu it did not work. thanks – user7070824 Oct 26 '16 at 17:35
  • if I remove codecs from with clause, it did not work. Can I change to with open( 'workflows.csv', 'w', encoding='utf-8') as outfile? – user7070824 Oct 26 '16 at 17:40
  • @user7070824 unicode was bolted onto python 2.x well after it was cooked so its rough around the edges. Python 2's `open` does not support encoding so you need to use the `codec` module. Python 3's `open` does support encoding. – tdelaney Oct 27 '16 at 05:02
  • thanks for your explanation. I will use codecs module, the output of CSV is what i want. I like your solution. I tried to use your script on other collections, it did not give me the correct result. if you have time, could you help me? I am going to import CSV to Postgresql from Mongodb. – user7070824 Oct 27 '16 at 15:20
  • I don't have mongodb or postgressql. I only was able to answer your question because you did a good job posting sample data. How about doing something like what I did with your next problem. Print some sample data from your script (e.g., `print(repr(next(cursor)))` and mock up the part you are having problems with so people without your other tools can run them. Then you can post that to stackoverflow. Half the time, you end figuring out the problem as you go. – tdelaney Oct 27 '16 at 17:25
  • I used your script to export the two of collections to csv so I am able to get a cleaned CSV file. it did not work on the third table. cursor = db.job_templates.find(...) ........ for x in cursor: primary_lang = x["requiredLanguages"][0] write.writerow((x["_id"], u','.join(x["requiredLanguages"]), x["categoryDivisionId"], x["name"][primary_lang], x["summary"][primary_lang], x["description"], x["industryId"], x["categoryId"], x["categoryReportIds"], x["typeId"], x["requiredKeywords"], x["applicationMethod"])) – user7070824 Oct 27 '16 at 19:20
  • I don't know your data. It would be useful to mock the data in `x` (`print(repr(x))` may be sufficient) into a python dict that you can post. Without seeing the data, I can't see what the problem is. You could also add an update to your question and post it there - where its easier to read. – tdelaney Oct 27 '16 at 20:27
  • please see the detail information I posted. http://stackoverflow.com/questions/40293165/mongodb-export-to-csv-issue – user7070824 Oct 27 '16 at 20:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/126867/discussion-between-user7070824-and-tdelaney). – user7070824 Oct 27 '16 at 20:53