0

I've got three python lists which I want to make into a dictionary, then join those three dictionaries to one based on the key values.

My python lists are made like this:

with open ('full_product_shipments.xml','r') as file2:
    full_product_shipments = list([line.strip().replace('{"','').replace('}','').replace('"','').replace(':',',').split(',') for line in file2])

And they look like this :

list1

[['transaction_id', '224847627', 'product_amount', '2.73', 'user_invoice_date', '2018-12-21'],
['transaction_id', '67919397', 'product_amount', '2.73', 'user_invoice_date', '2017-10-26']]

list2

[['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '172238850', 'shipping_label_created', '2018-09-25 18', '40', '52'],
['tracking_code', '22105784', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '111423825', 'shipping_label_created', '2018-04-13 11', '22', '44']]

list3

[['tracking_code', '21703238', 'from_country', 'FR', 'to_country', 'FR', 'amount', '3.23'],
['tracking_code', '41545695', 'from_country', 'FR', 'to_country', 'FR', 'amount', '2.9']]

list1 and list2 both have transaction_id on which I would need to join them once I convert them to a dict.

The newly joined list (list1 and list2) and list3 both have tracking_id by which I want to join them once list3 is converted to a dict.

I've tried using this :

result=[x.update(amount=y['amount']) for x in full_product_shipments for y in full_provider_invoices if x['transaction_id'] == y['transaction_id']]

But that throws me an TypeError:

TypeError: list indices must be integers or slices, not str

Maybe there is no need to convert everything to dict. I'm kind of new to python so if there is a better way to merge information based on key, I would be very appreciated to learn it.

codrelphi
  • 1,075
  • 1
  • 7
  • 13
Jonas Palačionis
  • 4,591
  • 4
  • 22
  • 55

3 Answers3

1

The example in your sample data does not seem to have matching entries, assuming your full data set will have matches for everything you could do something like this.

l1 = [['transaction_id', '224847627', 'product_amount', '2.73', 'user_invoice_date', '2018-12-21'], ['transaction_id', '67919397', 'product_amount', '2.73', 'user_invoice_date', '2017-10-26']]
l2 = [['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '172238850', 'shipping_label_created', '2018-09-25 18', '40', '52'], ['tracking_code', '22105784', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '111423825', 'shipping_label_created', '2018-04-13 11', '22', '44']]
l3 = [['tracking_code', '21703238', 'from_country', 'FR', 'to_country', 'FR', 'amount', '3.23'], ['tracking_code', '41545695', 'from_country', 'FR', 'to_country', 'FR', 'amount', '2.9']]

# Convert everything to dict
result = {y['transaction_id']:y for y in [dict(zip(x[::2], x[1::2])) for x in l1]}
d2 = {y['transaction_id']:y for y in [dict(zip(x[::2], x[1::2])) for x in l2]}
d3 = {y['tracking_code']:y for y in [dict(zip(x[::2], x[1::2])) for x in l3]}

# Update result dict with data from the other lists.
for entry in result.values():
    entry.update(d2[entry['transaction_id']])
    entry.update(d3[entry['tracking_code']])
CMMCD
  • 360
  • 1
  • 8
  • Thank you, that worked. I see that the `transaction_id` value is the key and the value is the rest of the dictionary part `{...'amount': '2.9'}, '131734284': {'transaction_id': '131734284', 'product_amount': '2.73', 'user_invoice_date': '2018-06-14', 'tracking_code': '24503936', 'from_country': 'FR', 'to_country': 'FR', 'package_type_id': '11', 'shipping_label_created': '2018-06-07 09', '51': '45', 'amount': '2.9'}`. How would I iterate through the value keys? For example if I want to grab all product amounts or from_country values? When the whole information is already placed as value? – Jonas Palačionis Oct 17 '19 at 20:40
  • You can iterate through the values like this `for v in result.values():` then `v` will contain each transaction dict which you can then index into `v['product_amount']` to get specific values. – CMMCD Oct 22 '19 at 13:44
1

This would be easier to join if the original raw data was in 'json' format instead of 'xml'. If you are downloading the data using a REST API, try passing in a keyword at the end '&$format=json' and see if the file result comes back as a json string. As an example, this will work in SAP REST APIs, but I think it's a standard parameter amongst many API providers.

To share an experience I had at work, I was given an SAP API where the default response was XML … I tried to make sense of it using Python XML parsing libraries (wracked my brain endlessly) until I realized I could just pass a parameter to the raw URL string, and it would come back as a JSON instead. Based on my experience, this is my recommendation for your problem.

Here is an example of a public API with syntax … try experimenting with similar combinations for your API.

https://vpic.nhtsa.dot.gov/api/

https://vpic.nhtsa.dot.gov/api/Home/Index/LanguageExamples

Now, if you can download a JSON string, it's pretty easy to convert that to a Python dictionary … lots of resources online how to do that. Then converting from Python dictionary to pandas dataframe is straightforward, lots of resources online how to do that. Then joining multiple dataframe together is straightforward, lots of resources online how to do that.

If you cannot get a JSON string, there are a few (more complicated) resources online on how to convert from XML to JSON. Here are a few links:

How to convert an xml string to a dictionary?

https://ericscrivner.me/2015/07/python-tip-convert-xml-tree-to-a-dictionary/

http://code.activestate.com/recipes/573463-converting-xml-to-dictionary-and-back/

You will find it's much easier to work with dictionary, rather than a list. A list is meant to store ordered items, but your list is storing a bunch of key-value pairs (which is exactly what a dictionary is good for).

Hope that helps!

0

It looks like, despite being an xml file name, your source is a JSON, as mentioned in another response, generating dictionaries from JSON might be easier.

Assuming it's not possible, the following program will go through your different lists, try to fetch a transaction id, which will be used as key for our main defaultdict, which will be populated with an empty dict, if the id didn't exist or append a new entry in its dictionary otherwise.

Here is the full code. Note that I have amended the second list to have an id matching the first list in order to show how fields from separate lists could be merged in the same dict. This assumes no overlap between fields.

from collections import defaultdict

list1 = [['transaction_id', '224847627', 'product_amount', '2.73', 'user_invoice_date', '2018-12-21'],
['transaction_id', '67919397', 'product_amount', '2.73', 'user_invoice_date', '2017-10-26']]

# list2 = [['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '172238850', 'shipping_label_created', '2018-09-25 18', '40', '52'],
list2 = [['tracking_code', '29285908', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '224847627', 'shipping_label_created', '2018-09-25 18', '40', '52'],
['tracking_code', '22105784', 'from_country', 'FR', 'to_country', 'FR', 'package_type_id', '10', 'transaction_id', '111423825', 'shipping_label_created', '2018-04-13 11', '22', '44']]

list3 = [['tracking_code', '21703238', 'from_country', 'FR', 'to_country', 'FR', 'amount', '3.23'],
['tracking_code', '41545695', 'from_country', 'FR', 'to_country', 'FR', 'amount', '2.9']]




def aggregate_lists(*lists):
    transactions = defaultdict(dict)

    for list in lists:
        for row in list:
            try:
                id_col = row.index('transaction_id')
                transaction_id = row[id_col + 1]
            except ValueError:
                continue # Better error handling to be added.

            for col in range(0, len(row), 2):
                if col != id_col:
                    transactions[transaction_id][row[col]] = row[col + 1]

    return transactions

def main():
    transactions = aggregate_lists(list1, list2, list3)
    for k, props in transactions.items():
        print(f'Transaction: {k}')
        for k, v in props.items():
            print(f'\t{k}: {v}')

if __name__ == '__main__':
    main()

Here is the given output:

Transaction: 224847627
    product_amount: 2.73
    user_invoice_date: 2018-12-21
    tracking_code: 29285908
    from_country: FR
    to_country: FR
    package_type_id: 10
    shipping_label_created: 2018-09-25 18
    40: 52
Transaction: 67919397
    product_amount: 2.73
    user_invoice_date: 2017-10-26
Transaction: 111423825
    tracking_code: 22105784
    from_country: FR
    to_country: FR
    package_type_id: 10
    shipping_label_created: 2018-04-13 11
    22: 44

I just realised that list3 doesn't have a transaction id, which is therefore ignored. Anyway this should give the idea.

alfajet
  • 389
  • 1
  • 14