0

I am trying to get fields names from the MongoDB using pymongo. Is there a way to do that?

Mongo Collection Format:

    "_id" : ObjectId("5e7a773721ee63712e9d25a3"),
    "effective_date" : "2020-03-24",
    "data" : [
        {
            "Year" : 2020,
            "month" : 1,
            "Day" : 28,
            "views" : 4994,
            "clicks" : 3982
        },
        {
            "Year" : 2020,
            "month" : 1,
            "Day" : 17,
            "views" : 1987,
            "clicks" : 3561
        },
        .
        .
        .
       ]

Is there a way I can get field names: I want to get: _id, effective_date, data.Year, data.month, data.Day, data.views, data.clicks

This is what I have:

from datetime import datetime, timedelta, date
import pymongo
from pymongo import MongoClient
from pymongo.read_preferences import ReadPreference
from pprint import pprint
from bson.son import SON
from bson import json_util
from bson.json_util import dumps, loads
import re


client = pymongo.MongoClient(host='mongodb://00.00.00.0:00000')
db = client.collection
pprint(db)

def get_results(filters):

    col=db.results
    res = col.find()

    res = list(res)

    return dumps(res, indent=4)

Is there a way for me to get just the field names using pymongo?

nb_nb_nb
  • 1,243
  • 11
  • 36
  • Vanilla Mongodb doesn't have a schema on collections - so listing the fields only makes sense when talking in the context of a single document. Other documents might have extra fields, or may be missing some. The best source of the fields is the application that's populating the DB in the first place – rdas Mar 25 '20 at 20:56
  • Does this answer your question? [Get names of all keys in the collection](https://stackoverflow.com/questions/2298870/get-names-of-all-keys-in-the-collection) Max you can do using MongoDB is what most answers are giving over there but if you need keys from sub-docs you need to do it in code.. – whoami - fakeFaceTrueSoul Mar 25 '20 at 21:29
  • @whoami, it does not because that only gives a solution for mapReduce – nb_nb_nb Mar 25 '20 at 21:52
  • There is a way but exactly what do you want? The unique list of all fields that show up without any attribution to the specific doc? – Buzz Moschetti Mar 25 '20 at 23:21
  • @BuzzMoschetti, yes thats what I want – nb_nb_nb Mar 26 '20 at 14:25

1 Answers1

1

We are not really filtering or aggregating in the example; we are doing a big find() and then we want all the field names. There is no projection either. So assuming that we are dragging over all the data anyway, let the client side do the work. Here's something that will capture unique field names including through arrays and give you a count of each unique field name as well:

r = [
    {"_id":0, "A":"A", "data":[
            {"Y":2020,"day":3,"clicks":12},
            {"Y":2020,"day":4,"clicks":192}
            ]} ,
    {"_id":1, "B":{"foo":"bar"}, "data":[
            {"Y":2020,"day":3,"clicks":888,"corn":"dog"},
            {"Y":2020,"day":4,"clicks":999,"zing":"zap"}
            ]} ,
    {"_id":2, "B":{"foo":"bit"} },
    {"_id":3, "B":{"fin":"bar"} }
]
coll.insert(r)

fieldNames = {}

def addFldName(s):
    if s not in fieldNames:
        fieldNames[s] = 0
    fieldNames[s] += 1

def process(path, v):
    addFldName(path)
    if("dict" == v.__class__.__name__):
        walkMap(path, v)
    elif("list" == v.__class__.__name__):
        walkList(path, v)

def walkMap(path, doc):
    dot = "" if path is "" else "."
    for k, v in doc.iteritems():
        s = path + dot + k
        process(s, v)

def walkList(path, array):
    dot = "" if path is "" else "."
    for n in range(0,len(array)):
        s = path + dot + str(n)
        process(s, array[n])

for doc in coll.find():
    walkMap("", doc)

print(fieldNames)

{u'A': 1, u'data.1.clicks': 2, u'B': 3, u'data.0': 2, u'data.1': 2, u'data.0.Y': 2, u'data.1.zing': 1, u'data.0.day': 2, u'B.fin': 1, u'B.foo': 2, u'data.1.Y': 2, u'_id': 4, u'data': 2, u'data.0.corn': 1, u'data.0.clicks': 2, u'data.1.day': 2}

It's a little weird, but yes, data.0.clicks is unique and shows up in 2 docs.

Buzz Moschetti
  • 7,057
  • 3
  • 23
  • 33