1

So, I have a pandas dataframe like this.

         mac_address         City
0  00:03:7f:05:c0:06      Kolkata
1  00:08:22:1c:50:07  Bhubaneswar
2  00:08:22:1c:50:07       Mumbai
3  00:08:22:1c:50:07       Mumbai
4  00:08:22:1c:50:07      Kolkata
5  00:08:22:24:cc:fb  Bhubaneswar
6  00:08:22:24:f8:02       Mumbai
7  00:08:22:24:f8:02      Kolkata
8  00:08:22:24:f8:02       Mumbai
9  00:08:22:24:f8:02  Bhubaneswar

Now the unique key here is mac_address so I want to start with a empty JSON document. for that I will start with a dictionary in python which later I can dump into JSON . I dont know how to start with empty dict(you can help with that too) so, I have started with one value. Now for each new row of data frame if the mac_address which is also is the index of dict if mac_id is there update the corresponding city and city count. And if it is not there add a new field(if it is called field) with index as the new mac_address and store the value accordingly.This is the dictionary to start with.

data = {"00:08:22:24:f8:02": {
                "mac_address" : "00:08:22:24:f8:02",
                "cities" : 
                       [
                         {'name': 'Bhubaneswar', 'count': 12},
                         {'name': 'Kolkata', 'count': 4},
                         {'name': 'Mumbai', 'count': 6}
                    ]
         }

   }  

city count is no. of times a mac_address visited to a city. By reading this particular row I would like to update a city named Bhubneswar and Count 1.

Update The question here is to how to update a dictionary directly from a data frame row by row. Which I somehow failed to explain. This update might help people to understand.

  • 3
    What's the question after all? You should [minimize your example](http://stackoverflow.com/help/mcve) and indicate what you've done already – DomTomCat Jun 13 '16 at 08:26
  • 1
    @DomTomCat updated the question now it may be more understandable. –  Jun 13 '16 at 08:40
  • @DomTomCat any update or help?? –  Jun 13 '16 at 09:27

1 Answers1

1

you can construct your dictionary, that might be saved as the JSON file like this:

In [129]: %paste
(df.groupby(['mac_address','City'])
   .size()
   .reset_index()
   .rename(columns={'City':'name',0:'count'})
   .groupby('mac_address')
   .apply(lambda x: {'mac_address':x.name, 'cities': x[['name','count']].to_dict('r')})
   .to_dict()
)
## -- End pasted text --
Out[129]:
{'00:03:7f:05:c0:06': {'cities': [{'count': 1, 'name': 'Kolkata'}],
  'mac_address': '00:03:7f:05:c0:06'},
 '00:08:22:1c:50:07': {'cities': [{'count': 1, 'name': 'Bhubaneswar'},
   {'count': 1, 'name': 'Kolkata'},
   {'count': 2, 'name': 'Mumbai'}],
  'mac_address': '00:08:22:1c:50:07'},
 '00:08:22:24:cc:fb': {'cities': [{'count': 1, 'name': 'Bhubaneswar'}],
  'mac_address': '00:08:22:24:cc:fb'},
 '00:08:22:24:f8:02': {'cities': [{'count': 1, 'name': 'Bhubaneswar'},
   {'count': 1, 'name': 'Kolkata'},
   {'count': 2, 'name': 'Mumbai'}],
  'mac_address': '00:08:22:24:f8:02'}}

In regards to updating nested fields in Mongo DB, see this question and answers: MongoDB - Update objects in a document's array (nested updating)

Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • the major issue is to checking whether mac_address is there or not and add new mac_address and update accordingly which is probably not solved in this answer. –  Jun 13 '16 at 08:42
  • @Danran, IMO it's not very convenient to store your data in JSON format if you want to be able to update it. Can you store your data as HDFStore? Why do you need JSON in this case - are you using Mongo DB as a backend? – MaxU - stand with Ukraine Jun 13 '16 at 08:54
  • Yes Mongo DB will be used. And yes a lot of feild will also be added this is just one example like :- location visited and count, visited on holiday or weekend or regular weekday. The thing which i need to figure out now is how to call the index which is mac_address and update accordingly. –  Jun 13 '16 at 08:59
  • @Danran, i would suggest you to open a new question with the `Mongo DB` tag and ask there how to update nested fields with the data from the dictionary and paste there two sample dictionaries - the one you want to update and another one produced by pandas. This question won't attract any Mongo DB specialists, because the `Mongo DB` tag is missing – MaxU - stand with Ukraine Jun 13 '16 at 09:05
  • @Danran, i've already added a `mongodb` tag and have updated your question - please check whether it's OK and update / rephrase it it's not OK – MaxU - stand with Ukraine Jun 13 '16 at 09:12
  • its not about adding simply two it is needed to be done row by row other. Anyway thanx for helping out this much. –  Jun 13 '16 at 09:15
  • one more thing can be done is that I can keep on updating data frame and can finally change that data frame into dict but i guess that will be complicated. –  Jun 13 '16 at 09:18
  • @Danran, i would recommend you to use a HDFStore or any RDBMS with plain data structures (for example MySQL or PostgreSQL) as a backend - it's much more natural for pandas and will help you to avoid a lot of small problems like this one – MaxU - stand with Ukraine Jun 13 '16 at 09:19
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/114510/discussion-between-dan-ran-and-maxu). –  Jun 13 '16 at 09:23
  • @Merlin actually I got the things working but my question wasn't on Mongo. I wanted to update the dictionary directly from data frame. which currently I am doing by first converting it to a dictionary then again updating the first one. –  Jun 13 '16 at 16:23
  • @Danran, if MaxU answer helped you should upvote and/or mark it correct – Merlin Jun 13 '16 at 16:26
  • @Merlin yes it definitely helped me so I up voted it, but it is not exactly the output I was looking for so I haven't marked it corrected –  Jun 13 '16 at 16:27
  • @Merlin I was looking for updating the dictionary directly from dataframe's row. The common part between dictionary and data frame is key value of dictionary and the column['mac_address'] of Data frame. so check for that and update the rest of the value of dictionary. Because this is just one example. I have a lot of values across one row of actual data frame(like: location state date of visit, Which I have discussed with macU also on the discussion ). By which I want to update more no. of sub keys of dictionary. I hope you understand. –  Jun 13 '16 at 16:40