2

This is my Dataframe For Example:

                 requesttime  checkinperiod

0   2016-10-16T14:53:58.000Z              8

1   2016-10-16T22:53:22.000Z              8

2   2016-10-18T14:52:22.000Z              8

3   2016-10-18T06:53:08.000Z              8

4   2016-10-16T06:53:37.000Z              8

5   2016-10-15T22:53:14.000Z              8

6   2016-10-19T22:51:51.000Z              8

7   2016-10-22T10:16:57.000Z             12

8   2016-10-20T10:54:37.000Z             12

9   2016-10-20T06:51:42.000Z             12

10  2016-10-10T22:44:17.000Z             24

11  2016-10-13T22:47:26.000Z              8

12  2016-10-14T14:53:27.000Z              8

13  2016-10-14T22:53:58.000Z              8

14  2016-10-15T06:53:28.000Z              8

15  2016-10-14T06:53:58.000Z              8

16  2016-10-10T16:38:28.000Z             24

17  2016-10-17T06:53:50.000Z              8

18  2016-10-17T14:53:12.000Z              8

19  2016-10-19T14:51:53.000Z              8

20  2016-10-17T22:53:44.000Z              8

21  2016-10-15T14:53:50.000Z              8

22  2016-10-18T22:52:39.000Z              8

23  2016-10-12T22:27:51.000Z             24

24  2016-10-11T23:05:57.000Z             24

25  2016-10-19T06:52:53.000Z              8

26  2016-10-21T10:09:09.000Z             12

27  2016-10-21T22:17:15.000Z             12

28  2016-10-22T22:16:53.000Z             12

29  2016-10-20T23:02:13.000Z             12

Desired Output:

{

8 : [
        [2016-10-16T14:53:58.000Z, 2016-10-16T22:53:22.000Z, 2016-10-18T14:52:22.000Z, 2016-10-16T06:53:37.000Z, 2016-10-15T22:53:14.000Z, 2016-10-19T22:51:51.000Z],
        [2016-10-13T22:47:26.000Z, 2016-10-13T22:47:26.000Z, 2016-10-14T22:53:58.000Z, 2016-10-15T06:53:28.000Z, 2016-10-14T06:53:58.000Z],
        [2016-10-17T06:53:50.000Z, 2016-10-17T14:53:12.000Z, 2016-10-19T14:51:53.000Z, 2016-10-17T22:53:44.000Z, 2016-10-15T14:53:50.000Z, 2016-10-18T22:52:39.000Z],
        [2016-10-19T06:52:53.000Z]
],
12: [
        [2016-10-22T10:16:57.000Z, 2016-10-20T10:54:37.000Z, 2016-10-20T06:51:42.000Z],
        [2016-10-21T10:09:09.000Z, 2016-10-21T22:17:15.000Z, 2016-10-22T22:16:53.000Z, 2016-10-20T23:02:13.000Z]
],
24: [
        [2016-10-10T22:44:17.000Z],
        [2016-10-10T16:38:28.000Z],
        [2016-10-12T22:27:51.000Z, 2016-10-11T23:05:57.000Z]
]
} 

Thanks Sumit

Dennis Golomazov
  • 16,269
  • 5
  • 73
  • 81
Sumit Gupta
  • 71
  • 1
  • 1
  • 8

2 Answers2

0

Using regex to filter the data and set the dict keys try text 2 regex

jackotonye
  • 3,537
  • 23
  • 31
0
import pandas as pd

# make sample data
col = 'checkinperiod'
df = pd.DataFrame([['a', 8], ['b', 8], ['c', 8],['c', 12], ['d', 8], ['e', 12], ['f', 12]], 
                  columns=['requesttime', col])
print df

  requesttime  checkinperiod
0           a              8
1           b              8
2           c              8
3           c             12
4           d              8
5           e             12
6           f             12 

# shift the dataframe one row down and compare with previous row
df['group'] = (df[col].shift(1) != df[col]).astype(int).cumsum()
print df

  requesttime  checkinperiod  group
0           a              8      1
1           b              8      1
2           c              8      1
3           c             12      2
4           d              8      3
5           e             12      4
6           f             12      4

# group by those groups and combine the results
df_grouped = pd.DataFrame(df.groupby([col, 'group']).apply(
    lambda df: list(df['requesttime'])))
df_grouped = df_grouped.reset_index().drop('group', axis=1)
print df_grouped

   checkinperiod          0
0              8  [a, b, c]
1              8        [d]
2             12        [c]
3             12     [e, f]

result = df_grouped.groupby(col).apply(lambda df: list(df[0])).to_dict()
print result

{8: [['a', 'b', 'c'], ['d']], 12: [['c'], ['e', 'f']]}

Inspired by [1]

Community
  • 1
  • 1
Dennis Golomazov
  • 16,269
  • 5
  • 73
  • 81
  • i tried same code with my data it is not giving proper result: – Sumit Gupta Nov 01 '16 at 05:33
  • Tried this df= pd.DataFrame([['2016-10-16T14:53:58.000Z',8], ['2016-10-16T22:53:22.000Z',8], ['2016-10-18T14:52:22.000Z',8], ['2016-10-18T06:53:08.000Z',8], ['2016-10-16T06:53:37.000Z',8], ['2016-10-15T22:53:14.000Z',8], ['2016-10-19T22:51:51.000Z',8], ['2016-10-22T10:16:57.000Z',12],['2016-10-20T10:54:37.000Z',12], ['2016-10-20T06:51:42.000Z ',12], ['2016-10-10T22:44:17.000Z',24], ['2016-10-13T22:47:26.000Z',8],['2016-10-14T14:53:27.000Z',8], ['2016-10-14T22:53:58.000Z',8], ['2016-10-15T06:53:28.000Z',8], ['2016-10-14T06:53:58.000Z',8]],columns=['requesttime', col]) – Sumit Gupta Nov 01 '16 at 05:45
  • It seems if values is more then two then it is again create one more list: Try this: df = pd.DataFrame([['a', 8], ['b', 8], ['c', 8],['c', 12], ['d', 8], ['e', 12], ['f', 12]], columns=['requesttime', col]) – Sumit Gupta Nov 01 '16 at 09:53
  • Thanks for finding the bug! I've updated the answer, making it simpler and now, hopefully, correct. – Dennis Golomazov Nov 01 '16 at 17:32