0

I have some data as a list of tuples, and trying to find the quickest way to transform it into a Pandas dataframe.

For our purposes the length of the list will be small ~20 items. Each tuple has only two elements, the first being a decimal, the second being a dictionary. The dictionary will have at least one, but possibly multiple key (str) value (decimal) pairs.

Example below:

bids = [(Decimal('10421.53'), {'d59133e6-891b-4d95-9744-046effe12096': Decimal('3')}), (Decimal('10422.87'), {'b6a51d83-f8a7-4401-b596-6abe55c5e554': Decimal('2')}), (Decimal('10424.03'), {'fbb471ad-44dc-49e8-845a-85035542115d': Decimal('2'), 'fe7e784e-2c5a-414f-9094-ca3c94aae19f': Decimal('2.5')}), (Decimal('10424.3'), {'82478287-6bde-4a0c-8283-34d0b0537479': Decimal('0.45')}), (Decimal('10424.31'), {'1d63cd02-e446-457c-89ce-7e88e1a0345a': Decimal('0.49834487')}), (Decimal('10424.32'), {'bf0b2eda-75da-4aff-a1ac-ccf36f198bd4': Decimal('0.24776675')}), (Decimal('10426.01'), {'4d12abd1-9330-4688-964e-07e5f5aa2b77': Decimal('1.21363179')}), (Decimal('10426.02'), {'613b7639-23bd-4953-9efd-0d96ef100789': Decimal('2')}), (Decimal('10426.07'), {'bd05a81a-1725-483b-80eb-2ce4b7aa843c': Decimal('0.0031')}), (Decimal('10426.12'), {'4210f639-5d54-49cb-a658-4f866e52b906': Decimal('2'), 'a31807bc-4109-4ba3-ae9a-4b04b3e650b1': Decimal('0.27650338'), '4d3bc7fd-3955-4e4a-ad46-ac688460f5be': Decimal('0.01'), 'ce85aa0b-abcf-4072-a7e7-5ec9cc9fa95f': Decimal('0.58784957'), '2d3a1f90-52b4-4d5b-8d29-4e2ba50c6447': Decimal('0.20332366')})]

Pretty-Print:

enter image description here

I have a basic loop construct to do this:

results = []
for bid in bids:
    for oid, amt in bid[1].items():
        results.append({'price': bid[0], 'amount': amt, 'order': oid})
result = pd.DataFrame(results)

Dataframe result:

enter image description here

Is there a quicker way to do this without the nested loops? Thanks in advance!

Justin
  • 545
  • 3
  • 7
  • 17
  • this might help https://stackoverflow.com/questions/38231591/splitting-dictionary-list-inside-a-pandas-column-into-separate-columns – Yuca Sep 04 '19 at 16:35

1 Answers1

0

Your method is fine,although a list comprehension would improve performance:

final=pd.DataFrame([{'price': bid[0], 'amount': amt, 'order': oid} 
          for bid in bids for oid, amt in bid[1].items()])

       price      amount                                 order
0   10421.53           3  d59133e6-891b-4d95-9744-046effe12096
1   10422.87           2  b6a51d83-f8a7-4401-b596-6abe55c5e554
2   10424.03           2  fbb471ad-44dc-49e8-845a-85035542115d
3   10424.03         2.5  fe7e784e-2c5a-414f-9094-ca3c94aae19f
4    10424.3        0.45  82478287-6bde-4a0c-8283-34d0b0537479
5   10424.31  0.49834487  1d63cd02-e446-457c-89ce-7e88e1a0345a
6   10424.32  0.24776675  bf0b2eda-75da-4aff-a1ac-ccf36f198bd4
7   10426.01  1.21363179  4d12abd1-9330-4688-964e-07e5f5aa2b77
8   10426.02           2  613b7639-23bd-4953-9efd-0d96ef100789
9   10426.07      0.0031  bd05a81a-1725-483b-80eb-2ce4b7aa843c
10  10426.12           2  4210f639-5d54-49cb-a658-4f866e52b906
11  10426.12  0.27650338  a31807bc-4109-4ba3-ae9a-4b04b3e650b1
12  10426.12        0.01  4d3bc7fd-3955-4e4a-ad46-ac688460f5be
13  10426.12  0.58784957  ce85aa0b-abcf-4072-a7e7-5ec9cc9fa95f
14  10426.12  0.20332366  2d3a1f90-52b4-4d5b-8d29-4e2ba50c6447
anky
  • 74,114
  • 11
  • 41
  • 70