-3

In the foll. dataframe, I have a collection of year and month values as tuples in a list:

state
alabama           [(2017.0, 10.0), (2017.0, 11.0), (2017.0, 12.0), (2018.0, 1.0)]
arkansas          [(2017.0, 10.0), (2017.0, 11.0), (2017.0, 12.0)]
colorado          [(2017.0, 9.0), (2017.0, 10.0), (2017.0, 11.0)]

How can I extract a superset list of year and month combinations? In this case, the soln would be:

[(2017.0, 9.0), (2017.0, 10.0), (2017.0, 11.0), (2017.0, 12.0), (2018.0, 1.0)]

I could potentially do it using a for loop but that would be slow, anything more pythonic?

Here is what I tried:

for row in df:
    if all(y in row for x, y in df):
        tmp = row

but I get this error:

ValueError: too many values to unpack (expected 2)
user308827
  • 21,227
  • 87
  • 254
  • 417
  • 1
    With all due respect, you seem to be outsourcing each step of your data-cleaning operation to Stack Overflow. It doesn't reflect well on the amount of effort you put into each question. – miradulo Dec 04 '17 at 20:49
  • @miradulo, this is a tricky bit, so asked. but you do have a point – user308827 Dec 04 '17 at 20:51
  • @miradulo, updated query with what I tried – user308827 Dec 04 '17 at 20:55
  • 1
    That doesn't look like much of an effort :P When you've cornered yourself into having lists of tuples inside your DataFrame like this, your solution likely won't improve much over a basic Python level approach. Something like `set(itertools.chain.from_iterable(df.state.values))` maybe. – miradulo Dec 04 '17 at 21:01
  • 1
    I guess it would be easier to do `df[['Year','Month']].sort_values(['Year','Month']).drop_duplicates().values.tolist()` applying it to the DF from your previous question... – MaxU - stand with Ukraine Dec 04 '17 at 21:04
  • @MaxU, that does work! happy to accept as soln – user308827 Dec 04 '17 at 21:06
  • @user308827 Well then you'll have to add your DataFrame from the previous question, lol. In which case the answer is "here's what you should have done two questions ago", which kinda drives home my point about how you're asking questions. – miradulo Dec 04 '17 at 21:07
  • "I could potentially do it using a for loop but that would be slow, anything more pythonic?" Your forced to be slow: you have a data-frame containing list-of-tuples. The solution `miraludo` gave is probably going to be as fast as anything with `pandas`. – juanpa.arrivillaga Dec 04 '17 at 21:07
  • @miradulo, I do agree with your point, I should have planned this better – user308827 Dec 04 '17 at 21:18
  • @user308827 No worries, just keep it in mind for next time :) – miradulo Dec 04 '17 at 21:18

1 Answers1

1

Using a sample DF from your previous question:

In [109]: df[['Year','Month']].sort_values(['Year','Month']).drop_duplicates().values.tolist()
Out[109]:
[[2017.0, 10.0],
 [2017.0, 11.0],
 [2017.0, 12.0],
 [2018.0, 1.0],
 [2018.0, 2.0],
 [2018.0, 3.0],
 [2018.0, 4.0],
 [2018.0, 5.0],
 [2018.0, 6.0],
 [2018.0, 7.0]]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419