0

i have a large string like

res = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'New Delhi' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME = 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == 'DRAMA' & FAV_LANGUAGE == 'English'",
"FAV_VENUE_CITY_NAME == 'New Delhi' & FAV_LANGUAGE == 'Hindi' & count_EVENT_LANGUAGE >= 1"]

now i am extracting fields by

 res = [re.split(r'[(==)(>=)]', x)[0].strip() for x in re.split('[&($#$)]', whereFields)]
 res = [x for x in list(set(res)) if x]

o/p:['FAV_GENRE', 'FAV_LANGUAGE', 'FAV_VENUE_CITY_NAME', 'count_EVENT_GENRE', 'EVENT_GENRE','count_EVENT_LANGUAGE']

then by following this filter out some items from a list and store in different arrays in python

i am getting values

 FAV_VENUE_CITY_NAME =  ['New Delhi', 'Mumbai', 'Bangalore']
 FAV_GENRE = ['|DRAMA|', '|COMEDY|', '|ACTION|ADVENTURE|SCI-FI|', 'DRAMA']
 EVENT_GENRE = ['FESTIVAL', 'WORKSHOP', 'FANTASY', 'KIDS', 'EXHIBITION']
 FAV_LANGUAGE = ['English', 'Hindi']
 count_on_field = ['EVENT_GENRE', 'EVENT_LANGUAGE']

Now i want to make a dictionary whose key will be field name in res. and values will be the result from above link.

Or is there a way to make items of list res as different different list by themselves.

SOmething like

res = ['FAV_GENRE', 'FAV_LANGUAGE', 'FAV_VENUE_CITY_NAME', 'count_EVENT_GENRE', 'EVENT_GENRE','count_EVENT_LANGUAGE']
for i in range(len(res)):
res[i] = list(res[i])   # make each item as an empty list with name as it is

so that they become like

  FAV_VENUE_CITY_NAME = []
  EVENT_GENRE = []
  FAV_GENRE = []
  FAV_LANGUAGE = [

then get the value to each individual lists in res list by following the method in above link.

Then make a dictionary like the below line making a dict with index as key

 a = [51,27,13,56]
 b = dict(enumerate(a))
 #####d = dict{key=each list name from res list, value = value in each ind. lists}
#

or if possible suggest something like from top res list....how to form a dict having key as field names and values as values from each lines

 o/p: d = {'FAV_VENUE_CITY_NAME':['Mumbai','New Delhi','Bangalore'], 'EVENT_GENRE':['KIDS','FANTASY','FESTIVAL','WORKSHOP','EXHIBITION'], 'FAV_GENRE':['|DRAMA|','|ACTION|ADVENTURE|SCI-FI|','|COMEDY|','DRAMA'], 'FAV_LANGUAGE':['English','Hindi']}

count_EVENT_GENRE>=1,count_EVENT_LANGUAGE>=1 should not be in that dictionary ,rather they should go to a list

count_on_fields = ['EVENT_GENRE','EVENT_LANGUAGE']

Pease if anybody has a better idea or suggestion, do help.

Community
  • 1
  • 1
Satya
  • 5,470
  • 17
  • 47
  • 72
  • Can you specify what values you get from the Link? You need to help answer this question by making the question clear. Have you tried d=dict(zip(res, values)) ? This creates a dict from 2 arrays (res and values) with res being the keys. As seen in this: http://stackoverflow.com/a/209854/1106659 – ant0nisk Dec 07 '15 at 11:39
  • @ant0nisk-i have edited the question. – Satya Dec 07 '15 at 11:59

3 Answers3

1

Here you go:

Create a list with all the values:

 values=[
    FAV_GENRE,
    FAV_LANGUAGE,
    FAV_VENUE_CITY_NAME,
    EVENT_GENRE,
    count_on_field
]

Then create your dict as proposed on this answer:

 d=dict(zip(res, values))

Note that the array order does matter, of course...

Haven't tested it, because I am running out of battery now. I hope it results to what you need

Community
  • 1
  • 1
ant0nisk
  • 581
  • 1
  • 4
  • 17
  • @ant0nisk-as i am getting those values in res and values list(from your code),there is mismatch in assigning.ex.EVENT_GENRE getting FAV_VENUE_CITY_NAME value.like wise for all. – Satya Dec 07 '15 at 12:06
  • And what about iterate over list res and making each item as individual list by themselves. – Satya Dec 07 '15 at 12:07
  • The mismatch is due to the wrong order on the lists. Can you try adjusting the order of the values list, and see if it results in what you want? Sorry, I am on my iPhone, since my computer's battery is drained out. – ant0nisk Dec 07 '15 at 12:09
  • Yeah, that is working,but my question was i am creating that res list and value list dynamically so i have no control on that ...So is there any way to matching names and then assign.(In this case i can change that manually,,,but that does not make it a generic code) – Satya Dec 07 '15 at 12:13
1

I think it's going to be difficult for you to use the lists you get from the regex, as there's no way to tie them back to their 'keys'. I think it might be easiest to start from your original list, and work your way down.

from itertools import chain

res1 = [s.split(' & ') for s in res]
res2 = list(chain(*res1))
res3 = [item.replace('==', ' == ').replace('>=', ' >= ') for item in res2]
res4 = [item.split() for item in res3 if item]
res5 = [(item[0], item[-1]) for item in res4]

temp_dict = dict()
temp_set = set()
for key, value in res5:
    if key.startswith('count'):
        temp_set.add(key.replace('count_',''))
    else:
        clean_value = value.replace("'","")
        temp_dict.setdefault(key, set()).add(clean_value)

output_dict = {key:list(value) for key, value in temp_dict.items()}
output_list = list(temp_set)

print(output_dict)
print(output_list)

You can try printing the intermediate results (res1 ~ res5) to see what's going on.

For production use, especially if you're dealing with a much larger res, you should probably change each of the list comprehensions to generator expressions, and change res2 = list(chain(*res1)) to res2 = chain.from_iterable(res1)).

zehnpaard
  • 6,003
  • 2
  • 25
  • 40
  • @zehnparad-Great help,but actually i need to capture the count_EVENT_GENRE and count_EVENT_LANGUAGE to a user defined list count_on_fields=['EVENT_GENRE','EVENT_LANGUAGE']. – Satya Dec 07 '15 at 12:38
  • And please give me an example(link) for """""you should probably change each of the list comprehensions to generator expressions, and change res2 = list(chain(*res1)) to res2 = chain.from_iterable(res1))."""" – Satya Dec 07 '15 at 12:39
  • Can you just rename `output_list` to `count_on_fields`? – zehnpaard Dec 07 '15 at 13:29
  • Generator expressions are just list comprehensions with the [] replaced with (), e.g. `res1 = [s.split(' & ') for s in res]` becoming `res1 = (s.split(' & ') for s in res)`. [Here](http://stackoverflow.com/questions/47789/generator-expressions-vs-list-comprehension) is a question about the difference between the two, and the merits of one approach over the other. – zehnpaard Dec 07 '15 at 13:32
1

Here follows an IPython session that shows you how you can build a dictionary from your data:

In [1]: from re import split

In [2]: from itertools import chain

In [3]: data = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'New Delhi' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' && EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME == 'Mumbai' &  & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == 'DRAMA' & FAV_LANGUAGE == 'English'",
"FAV_VENUE_CITY_NAME == 'New Delhi' & FAV_LANGUAGE == 'Hindi' & count_EVENT_LANGUAGE >= 1"]

In [4]: d = {}

In [5]: for elt in chain(*(split(' *& *', rec) for rec in data)):     
    if not elt: continue
    k, v = split(' *[=>]= *', elt)
    v = v.strip("'")
    if k not in d: d[k] = []
    if v not in d[k]: d[k].append(v)
   ...:     

In [6]: d
Out[6]: 
{'EVENT_GENRE': ['KIDS', 'FANTASY', 'FESTIVAL', 'WORKSHOP', 'EXHIBITION'],
 'FAV_GENRE': ['|DRAMA|', '|ACTION|ADVENTURE|SCI-FI|', '|COMEDY|', 'DRAMA'],
 'FAV_LANGUAGE': ['English', 'Hindi'],
 'FAV_VENUE_CITY_NAME': ['Mumbai', 'New Delhi', 'Bangalore'],
 'count_EVENT_GENRE': ['1'],
 'count_EVENT_LANGUAGE': ['1']}

In [7]: 

Addendum

In [7]: count_fields = []

In [8]: for k in d:
    if k[:6] == 'count_'
        # no need for testing 'cs dict keys are unique
        count_fields.append(k[6:])
        del d[k]

In [9]: 
gboffi
  • 22,939
  • 8
  • 54
  • 85
  • @gboffi-easiest way i can see around.But only thing is dealing with count_ fields seems difficult.As my need is count fields to be filtered out at beginning and should be stored in a separate user defined list as count_fields = ['EVENT_GENRE','EVENT_LANGUAGE'] instead of count_EVENT_GENRE': ['1'], 'count_EVENT_LANGUAGE': ['1'] this.THANKS. – Satya Dec 08 '15 at 04:31
  • @Satya You can EASILY post process the dictionary. I have added to my answer some _untested_ code that could do what you want... please have a look at it. – gboffi Dec 08 '15 at 07:11