2

trying to select subset from a list, however the order is reversed after selection

tried using pandas isin

df.mon =[1,2,3,4,5,6,7,8,9,10,11,12,1,2,3,4,5,6,7,8,9,10,11,12,...] 
 # selecting 
 results = df[df.month.isin([10,11,12,1,2,3])]
 print(results.mon]
 mon = [1,2,3,10,11,12, 1,2,3,10,11,12,...]
 desired results
 mon= [10,11,12,1,2,3,10,11,12,1,2,3,...]

 # sorting results in this
 mon = [1,1,2,2,3,3,10,10,11,11,12,12] and I dont want that either

 thanks for the help
user11036847
  • 455
  • 1
  • 4
  • 12

4 Answers4

1

I work most with basic python lists, so I have converted the df to a list.

Data

The data is displayed in an xlsx file like this. The input is a xlsx document which goes 1, 2, .. 12, 1, 2, .. 12 only twice, the "Values" start at 90 and count by 10 all the way out to the second 12.

A portion of the xlsx file

Process

import pandas as pd

df = pd.read_excel('Book1.xlsx')
arr = df['Column'].tolist()
arr2 = df['Values'].tolist()

monthsofint = [10, 11, 12, 1, 2, 3]

locs = []

dictor = {}
for i in range(len(monthsofint)):
    dictor[monthsofint[i]] = []

for i in range(len(monthsofint)):  # !! Assumption !!
    for j in range(len(arr)):
        if monthsofint[i] == arr[j]:
            dictor[monthsofint[i]].append(j)

newlist = []
newlist2 = []
for i in range(len(dictor[monthsofint[0]])):
    for j in range(len(monthsofint)):
        newlist.append(arr[dictor[monthsofint[j]][i]])
        newlist2.append(arr2[dictor[monthsofint[j]][i]])

print(newlist)
print(newlist2)

Output: [10, 11, 12, 1, 2, 3, 10, 11, 12, 1, 2, 3] and [180, 190, 200, 90, 100, 110, 300, 310, 320, 210, 220, 230]

Note on Assumption: The assumption made is that there will always be 12 months for every year in the file.

GeneralCode
  • 794
  • 1
  • 11
  • 26
  • pretty much close except that my new list should be [10,11,12,1,2,3, 10,11,12,1,2,3 ...] and not [10, 10, 11, 11, 12, 12, 1, 1, 2, 2, 3, 3] – user11036847 Jul 14 '19 at 03:06
  • Yeah I relized that give me a minute – GeneralCode Jul 14 '19 at 03:10
  • thanks that works, so how can use this new list to subset the actual values of fro the dataframe correctly? – user11036847 Jul 14 '19 at 04:03
  • Can you give me more context? I am not sure I understand your question well enough to give an accurate answer. – GeneralCode Jul 14 '19 at 04:19
  • say this my df month value 1 12.214394 2 -30.114391 3 1.343937 4 -29.290909 5 22.703789 6 14.211441 7 5.654198 8 -28.977440 9 8.361656 10 12.296993 11 18.193127 12 -20.491604 1 45.214394 2 10.885609 3 -14.656063 4 3.709091 5 8.703789 6 24.211441 7 -46.345802 I want to use my list to get the values of each month say df[newlist] but doest work currently – user11036847 Jul 14 '19 at 04:31
  • I have add `arr2` and `newlist2` to try to supply the data you were requesting. Let me know if you have any questions. – GeneralCode Jul 14 '19 at 05:01
  • Almost perfect except one last thing so this is my actual in month value format = 1:10, 2:-20,3:8, 10:2,11:4 , 12:-10, 1:10, 2:20, 3:4, 10:15, 11: 2, 12:20,...... new list copies values this way [2, 4, -10, 10,20,8] here 10,20,8 means its starting the 1,2,3 values from the begining. But I want it to ignore tthe first 1,2,3 and so after copying the 10,11,12 values the next 1,2,3 should continue and not start from the begining of the time series. I hope you understand me. Thanks – user11036847 Jul 14 '19 at 05:35
1

In your case , we using Categorical + cumcount

#results = df[df.mon.isin([10, 11, 12, 1, 2, 3])].copy()
results.mon=pd.Categorical(results.mon,[10,11,12,1,2,3])
s=results.sort_values('mon')
s=s.iloc[s.groupby('mon').cumcount().argsort()]
s
Out[172]: 
   mon
9   10
10  11
11  12
0    1
1    2
2    3
21  10
22  11
23  12
12   1
13   2
14   3
BENY
  • 317,841
  • 20
  • 164
  • 234
0

I think you can take what we can have for each category, then use izip_longest to zip those lists.

0

So I found a relatively easy and simple way to do it from another source

For those who might be interested:

  df[(df.index > 4) & (df.month.isin([10, 11, 12, 1, 2, 3]))]
user11036847
  • 455
  • 1
  • 4
  • 12