0

Imagine the following DF:

data = {'Person': ['A', 'A', 'B', 'B', 'C', 'C', 'C', 'C', 'C'], 'Field': ['Age', 'Weight', 'Age', 'Height', 'Height', 'year', 'month', 'day', 'city']}
df = pd.DataFrame(data)

  Field Person
    Age      A
 Weight      A
    Age      B
 Height      B
 Height      C
   year      C
  month      C
    day      C
   city      C

Imagine I wanted to reduce the number of queries I need to do to grab the field from each person. So I would first get A and B on a room and ask them their age, then I would ask A his height, then I could get B & C and ask them for their height and finally ask C for all the remaining fields.

This may sound more complicated than simply asking A, B and C separately. But imagine I had:

  Field Person
    Age      A
    Age      B
 Height      B
 Height      B
   year      B
  month      B
    Age      C
 Height      C
 Height      C
   year      C
  month      C

It is clear here that asking each person for the information is less effective than asking Age to A, B and C and then Height, Weight, year and months to B and C.

I can think of many ways of doing this programmatically but was wondering what is the most efficient one.

Yona
  • 571
  • 7
  • 23
  • really this question is a dupe of this: http://stackoverflow.com/questions/34233455/using-panda-for-comparing-column-values-and-creating-column-based-on-the-values and this http://stackoverflow.com/questions/41481208/python-string-to-integer-as-a-key and countless others, but your wording is a little different – EdChum Mar 14 '17 at 11:51
  • Are you wanting something different than my answer here or the linked posts – EdChum Mar 14 '17 at 11:53
  • @EdChum Thanks for your reply. I was not aware that Factorize did something similar but not exactly what I need so I have reworded the question – Yona Mar 14 '17 at 13:45
  • I think you need to explain clearer the logic here, I will remove my answer – EdChum Mar 14 '17 at 13:49

0 Answers0