-1
df.at[0, 'A'] = [{'score': 12, 'player': [{'name': 'Jacob', 'score': 2},
                                          {'name': 'Shane', 'score': 5}, ...]},
                 {'score': 33, 'player': [{'name': 'Cindy', 'score': 4}, ...]}, ...]

Say I have a list of n dictionaries for column 'A' in a data frame like above. I want to add a new key named 'game' which is the index of the list. So, it'd be like below.

df.at[0, 'A'] = [{'score': 12, 'player': [...], 'game': 0},
                 {'score': 33, 'player': [...], 'game': 1}, ...]

Since I have to do the same thing with 'player', I don't want to use for loops.
Is there a way to achieve this?


df.at[0, 'A'][0]['player'] = [{'name': 'Jacob', 'score': 2, 'number': 0},
                              {'name': 'Shane', 'score': 5, 'number': 1}, ...]}

For example, 'player' will have key 'number' whose value is the index of the inner list.


Basically, I don't want to use any nested for loop to do this because the actual data I have received is a way larger NL data that actually came in that ridiculous form.

user8397275
  • 131
  • 1
  • 8
  • 1
    I don't understand your reason for not using a `for` loop. What do you mean about doing the same thing with `player`? – Barmar Jan 21 '19 at 08:31

2 Answers2

2

Given your data structure, Barmar is probably right that you're stuck with a for loop (which there's nothing wrong with, by-the-by). Here's a couple of potential work-arounds.

A "solution"

The information you're trying to record is redundant, so you probably don't need to bother with it in the first place.

Basically what you're saying is that the value of game and number are already encoded by the position of each element in its list. Chances are that there's a way to get whatever final result you're trying to compute while also skipping writing out all of this redundant information.

A larger point

You're trying to wrangle a large set of data with a complicated structure. You're probably about at the limit of what you can reasonably deal with using the kind of ad-hoc structure you posted. Here are some better ways:

  • If you can figure out a way to flatten your data (or least to make it "rectangular", in a sense), then you may be able to wrangle it into a Numpy array. Numpy hits a nice sweet spot between extremely fast and easy to use.

  • You could convert the inner dictionaries into more levels in your dataframe, to make a sort of hierarchical dataframe with an associated MultiIndex. There's a good SO thread with a lot more info here.

  • While not necessarily the most performant option, one really good way to make it easier to understand data with a complex structure is to represent that structure as a hierarchy of user-defined objects. In the past I've found this to be a very fruitful way to uncover hidden relationships in data (though like I said, it can be slow).

tel
  • 13,005
  • 2
  • 44
  • 62
  • So true. I gave up keeping the original structure and have been re-structuring the data. I needed the index so I can match it with the original text after dropping irrelevant sentences. (That is, the 'score'.) The index is the position of a sentence. The index of the inner list (the 'player') is the position of a word in that sentence. BTW, thanks for the links, I guess I can try them later. – user8397275 Jan 21 '19 at 09:15
1

I don't understand your reason for not wanting to use a for loop. If you can get over that, it would be:

for i, d in enumerate(list_of_dicts):
    d['game'] = i
Barmar
  • 741,623
  • 53
  • 500
  • 612