2

Trying to understand what the best idea/practice here would be.. I have a dataframe with interviewers at various locations.. I would like to create a dictionary or some sort of data structure that holds the interviewers name and then every coordinate point we have for their interview. An example of the dataframe I am working with is something like this:

    interview       longitude        latitude
1   A1                  34.2             90.2
2   A1                  54.2             23.5
6   A1                  NaN              NaN
7   A2                  NaN              NaN
8   A2                  NaN              NaN
9   A2                  23.1             38.2
10  A2                  -23.7            -98.4

I would like to essentially have a dictionary that has 'A1' and it holds (34.2, 90.2), (54.2, 23.5) and 'A2' would hold (23.1, 39.2), (-23.7, -98.4).

    location_dict = {}
    for name, group in df.groupby('Interviewer'):
        minidf = group[['Interviewer','Longitude','Latitude']].dropna()
        for index, row in minidf.iterrows():
            location_dict[name]=(row['Longitude'], row['Latitude'])

My logic here is a bit off, but I don't have any way to 'append' to a dictionary, so my dictionary is only outputting the data from last iteration of iterrows... How would I go about fixing this?

jpp
  • 159,742
  • 34
  • 281
  • 339
sgerbhctim
  • 3,420
  • 7
  • 38
  • 60
  • You are now storing a single tuple as the value. Try to extrapolate from there a bit; a tuple is a container just like a dictionary is a container. Know of any other containers that you could use to hold multiple tuples? – Martijn Pieters Jan 15 '19 at 14:55
  • Put differently: you need to pick a container type for the dictionary values, to store those multiple tuples in. A dictionary just maps keys to values, but those values can be *any type of Python object*, including tuples, other dictionaries, and more. – Martijn Pieters Jan 15 '19 at 14:56
  • Should I put all the tuples into a list, and then add them to Interviewer key? – sgerbhctim Jan 15 '19 at 14:57
  • For an interviewer key, add a list to the dictionary *if the key isn't in the dictionary yet*. If the key is in the dictionary already, just append to the list that is there. – Martijn Pieters Jan 15 '19 at 14:58
  • 1
    You can test, add a list if it is not there yet, and append to the list, in one step with `location_dict.setdefault(name, []).append((row['Longitude'], row['Latitude']))`. – Martijn Pieters Jan 15 '19 at 14:59

1 Answers1

4

One solution using groupby:

def zipper(row):
    return list(zip(row['longitude'], row['latitude']))

res = df.dropna(subset=['longitude', 'latitude'])\
        .groupby('interview').apply(zipper).to_dict()

# {'A1': [(34.2, 90.2), (54.2, 23.5)],
#  'A2': [(23.1, 38.2), (-23.7, -98.4)]}

Another using collections.defaultdict:

from collections import defaultdict

res = defaultdict(list)
for row in df.dropna(subset=['longitude', 'latitude']).itertuples(index=False):
    res[row.interview].append((row.longitude, row.latitude))

Since defaultdict is a subclass of dict, in general no further manipulation is required.

jpp
  • 159,742
  • 34
  • 281
  • 339