0

I have the following list of dicts:

 authorvals= [
        {
            "author": "author1",
            "year": [
                "2016"
            ],
            "value1": 4.0
        },
        {
            "author": "author2",
            "year": [
                "2016"
            ],
            "value1": 2.0
        },
        {
            "author": "author1",
            "year": [
                "2016"
            ],
            "value3": 1.0
        },
        {
            "author": "author1",
            "year": [
                "2016"
            ],
            "value2": 4.0
        },
        {
            "author": "author2",
            "year": [
                "2016"
            ],
            "value2": 2.0
        }]

Now I want lists from the dict as follows:

val_list=["value1","value2","value3"]
num_list=[[4,2],[4,2],[1,0]]
auth_list=["author1","author2"]

I want the dict as three separate lists.

  1. First list is the keys "value"+x in the dict
  2. Second list is the value of that particular key for auth1 and auth2
  3. Third list is just the list of authors

I have tried the following code:

num_list=[]
auth_list=[]
val_list=[]
for item in authors_dict: 
        if item['author'] not in auth_list: 
            auth_list.append(item['author']) 
            for k in item.keys(): 
                if k.startswith("value") and k not in val_list: 
                    val_list.append(k) 
                    val_list.sort() 
                    for v in val_list:
                        temp_val_list = [] 
                        for i in authors_dict: 
                            try: 
                                val = i[v] 
                                temp_val_list.append(val) 
                            except: 
                                pass
                        if len(temp_val_list) > 0: 
                            num_list.append(temp_val_list) 
                            print(val_list) 
                            print(num_list) 
                            print(auth_list)

but this is not what I want to accomplish the 0 in the last list of num_list is because there is no value for author2.If there is no value,then 0 should be printed

Sam
  • 131
  • 1
  • 9

3 Answers3

1
  1. Collect authors in a set
  2. Collect keys and values in a defaultdict
  3. Postprocess the values by adding padding upto the maxlength.
from collections import defaultdict

DATA_INDEX = 2

def collect(records):
    vals = defaultdict(list)
    authors = set()
    for record in records:
        for i, (k, v) in enumerate(record.items()):
            if k == 'author':
                authors.add(v)
            elif i == DATA_INDEX:
                vals[k].append(int(v))

    return (list(authors),
            list(vals.keys()),
            list(pad_by_max_len(vals.values())))



def pad_by_max_len(lol):
    lengths = map(len, lol)
    padlength = max(*lengths)
    padded = map(lambda l: pad(l, padlength), lol)
    
    return padded

def pad(l, padlength):
    return (l + [0] * padlength)[:padlength]

print(collect(authorvals))

Giving:

(
    ['author2', 'author1'],
    ['value1', 'value3', 'value2'],
    [[4, 2], [1, 0], [4, 2]]
)
kluvin
  • 5,235
  • 3
  • 13
  • 20
  • no gurantee that an item starts with 'value'.It is just an example.It could be data or process or anything else @kluvin – Sam Dec 03 '20 at 14:47
  • @Sam, please see the revised answer. I am afraid there could be a problem with ordering, since dictionaries don't normally guarantee this, however. Edit: https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6, looks like it isn't a problem :) – kluvin Dec 03 '20 at 14:50
0

Wasn't super clear on two things so I made assumptions:

  1. Ordering of values doesn't matter
  2. All values should appear as many times as the maximum occurring value. If not, add zeros to the num_list for that value.

The following code should work to that end:

val_list=[]
num_list=[]
auth_list=[]
max_values = 0

for d in authorvals:
    if d["author"] not in auth_list:
        auth_list.append(d["author"])
    for key in d:
        if key.startswith("value"):
            if key not in val_list:
                val_list.append(key)
                num_list.append([d[key]])
                max_values = max(max_values, 1)
            else:
                idx = val_list.index(key)
                num_list[idx].append(d[key])
                max_values = max(max_values, len(num_list[idx]))

for sublist in num_list:
    if len(sublist) != max_values:
        padding = [0] * (max_values - len(sublist))
        sublist.extend(padding)

print(val_list)  # ['value1', 'value3', 'value2']
print(num_list)  # [[4.0, 2.0], [1.0, 0], [4.0, 2.0]]
print(auth_list) # ['author1', 'author2']
  • no gurantee that an item starts with 'value'.It is just an example.It could be data or process or anything else @Saad Hussain – Sam Dec 03 '20 at 14:48
0
auth_list = set([x['author'] for x in authorvals]) # in case you need to access it by index, you can cast the type into list
indexed = {} # for easy representation

for auth in authorvals:
  keys = auth.keys()
  filtered = keys.__sub__(['author', 'year']).__iter__().__next__() # removing 'author' and 'year' key from the key list and take the first value
  if indexed.get(filtered) is None:
    indexed[filtered] = [] # initialize if no same key name found
  indexed[filtered].append(auth[filtered]) # append the value from iteration to respective index

val_list = list(indexed.keys())
num_list = [indexed[key] for key in val_list]

Note that the num_list might be different in that the number of pairs of each members does not have fixed number of members as in the example provided, but you can always process them afterwards

darkash
  • 159
  • 1
  • 9
  • filtered = keys.__sub__(['author', 'year'])[0] # removing 'author' and 'year' key from the key list TypeError: 'set' object is not subscriptable @darkash I am getting this error – Sam Dec 03 '20 at 14:43
  • @Sam I've revised the answer, should be able to run now – darkash Dec 08 '20 at 08:47