2
          day         city  temperature  windspeed   event

        2017-01-01  new york           32          6    Rain
        2017-01-02  new york           36          7   Sunny
        2017-01-03  new york           28         12    Snow
        2017-01-04  new york           33          7   Sunny
        2017-01-05  new york           31          7    Rain
        2017-01-06  new york           33          5   Sunny
        2017-01-07  new york           27         12    Rain
        2017-01-08  new york           23          7  Rain
        2017-01-01    mumbai           90          5   Sunny
        2017-01-02    mumbai           85         12     Fog
        2017-01-03    mumbai           87         15     Fog
        2017-01-04    mumbai           92          5    Rain
        2017-01-05    mumbai           89          7   Sunny
        2017-01-06    mumbai           80         10     Fog
        2017-01-07    mumbai           85         9     Sunny
        2017-01-08    mumbai           89          8    Rain
        2017-01-01     paris           45         20   Sunny
        2017-01-02     paris           50         13  Cloudy
        2017-01-03     paris           54          8  Cloudy
        2017-01-04     paris           42         10  Cloudy
        2017-01-05     paris           43         20   Sunny
        2017-01-06     paris           48         4  Cloudy
        2017-01-07     paris           40          14  Rain
        2017-01-08     paris           42         15  Cloudy
        2017-01-09     paris           53         8  Sunny

The above shows the .txt file.

My goal is to create 4 groups as evenly distributed as possible, containing all the cities, meaning that each group has 'new york','mumbai','paris'.

Since there are 25 data, 3 groups will have 6 lines while 1 group will have 7 lines.

What I have in mind right now is that, since the data are already sorted by their city, I can read the text file lines by lines and then for each line, i will append it to 4 groups (G1-G4) in an alternating pattern. Meaning to say, the first line, it will append it to G1, then 2nd line to G2, 3rd to G3, 4th to G4 , 5th will append back to G1, 6th append to G2 and so on. This can ensure that all the groups have all the 3 cities.

Is it possible to code in this way?

Expected result:

G1: Row/Line 1 , Row 5, Row 9,

G2: Row 2, Row 6, Row 10,

G3: Row 3, Row 7, Row 11,

G4: Row 4, Row 8, Row 12, and so on.

Lily
  • 55
  • 1
  • 8
  • Have you tried anything? If your file is in a list, you could do something like `g1 = my_list[0::4]`, etc... – mike.k Aug 08 '18 at 02:55
  • @mike.k I've tried to handle this using pandas but i am only able to split them into 4 different groups without ensuring all cities are in all groups. I will open the file and append every sentence into a list first. What does [0::4] means? – Lily Aug 08 '18 at 03:01

3 Answers3

2

Since your input is already sorted, you can split the string into a list and then slice them using a step of 4:

data = '''        2017-01-01  new york           32          6    Rain
        2017-01-02  new york           36          7   Sunny
        2017-01-03  new york           28         12    Snow
        2017-01-04  new york           33          7   Sunny
        2017-01-05  new york           31          7    Rain
        2017-01-06  new york           33          5   Sunny
        2017-01-07  new york           27         12    Rain
        2017-01-08  new york           23          7  Rain
        2017-01-01    mumbai           90          5   Sunny
        2017-01-02    mumbai           85         12     Fog
        2017-01-03    mumbai           87         15     Fog
        2017-01-04    mumbai           92          5    Rain
        2017-01-05    mumbai           89          7   Sunny
        2017-01-06    mumbai           80         10     Fog
        2017-01-07    mumbai           85         9     Sunny
        2017-01-08    mumbai           89          8    Rain
        2017-01-01     paris           45         20   Sunny
        2017-01-02     paris           50         13  Cloudy
        2017-01-03     paris           54          8  Cloudy
        2017-01-04     paris           42         10  Cloudy
        2017-01-05     paris           43         20   Sunny
        2017-01-06     paris           48         4  Cloudy
        2017-01-07     paris           40          14  Rain
        2017-01-08     paris           42         15  Cloudy
        2017-01-09     paris           53         8  Sunny'''
lines = data.splitlines()
groups = [lines[i::4] for i in range(4)]
for g in groups:
    print(g)

This outputs:

['        2017-01-01  new york           32          6    Rain', '        2017-01-05  new york           31          7    Rain', '        2017-01-01    mumbai           90          5   Sunny', '        2017-01-05    mumbai           89          7   Sunny', '        2017-01-01     paris           45         20   Sunny', '        2017-01-05     paris           43         20   Sunny', '        2017-01-09     paris           53         8  Sunny']
['        2017-01-02  new york           36          7   Sunny', '        2017-01-06  new york           33          5   Sunny', '        2017-01-02    mumbai           85         12     Fog', '        2017-01-06    mumbai           80         10     Fog', '        2017-01-02     paris           50         13  Cloudy', '        2017-01-06     paris           48         4  Cloudy']
['        2017-01-03  new york           28         12    Snow', '        2017-01-07  new york           27         12    Rain', '        2017-01-03    mumbai           87         15     Fog', '        2017-01-07    mumbai           85         9     Sunny', '        2017-01-03     paris           54          8  Cloudy', '        2017-01-07     paris           40          14  Rain']
['        2017-01-04  new york           33          7   Sunny', '        2017-01-08  new york           23          7  Rain', '        2017-01-04    mumbai           92          5    Rain', '        2017-01-08    mumbai           89          8    Rain', '        2017-01-04     paris           42         10  Cloudy', '        2017-01-08     paris           42         15  Cloudy']
blhsing
  • 91,368
  • 6
  • 71
  • 106
1

I only keep row index for easy explanation

rows = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]

Then you can use slicing

G1, G2, G3, G4 = [rows[i::4] for i in range(4)]

The results will be

G1 == [1, 5, 9, 13, 17, 21, 25]
G2 == [2, 6, 10, 14, 18, 22]
G3 == [3, 7, 11, 15, 19, 23]
G4 == [4, 8, 12, 16, 20, 24]]
James Liu
  • 497
  • 3
  • 8
1

You can use pandas and some math operations to replicate your groups.

n, r = df.shape[0] // 4, df.shape[0] % 4
df['group'] = [1,2,3,4]*n + [1,2,3,4][:r]


    day         city        temperature windspeed   event   group
0   2017-01-01  new york    32          6           Rain    1
1   2017-01-02  new york    36          7           Sunny   2
2   2017-01-03  new york    28          12          Snow    3
3   2017-01-04  new york    33          7           Sunny   4
4   2017-01-05  new york    31          7           Rain    1
5   2017-01-06  new york    33          5           Sunny   2
6   2017-01-07  new york    27          12          Rain    3
7   2017-01-08  new york    23          7           Rain    4
8   2017-01-01  mumbai      90          5           Sunny   1
9   2017-01-02  mumbai      85          12          Fog     2
10  2017-01-03  mumbai      87          15          Fog     3
11  2017-01-04  mumbai      92          5           Rain    4
12  2017-01-05  mumbai      89          7           Sunny   1
13  2017-01-06  mumbai      80          10          Fog     2
14  2017-01-07  mumbai      85          9           Sunny   3
15  2017-01-08  mumbai      89          8           Rain    4
16  2017-01-01  paris       45          20          Sunny   1
17  2017-01-02  paris       50          13          Cloudy  2
18  2017-01-03  paris       54          8           Cloudy  3
19  2017-01-04  paris       42          10          Cloudy  4
20  2017-01-05  paris       43          20          Sunny   1
21  2017-01-06  paris       48          4           Cloudy  2
22  2017-01-07  paris       40          14          Rain    3
23  2017-01-08  paris       42          15          Cloudy  4
24  2017-01-09  paris       53          8           Sunny   1        
rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • Initially i tried to handle with pandas but still stuck. Can you please explain your code to me. I've tried it and after adding the 'group' index, i can use groupby() to group the data according to their allocated groups. After that, how can i export the every group into csv file? – Lily Aug 08 '18 at 03:20
  • @Lily So basically the code repeats `[1,2,3,4]` `n` times and then fills the rest. So, for example, if you have 10 rows, you want `[1,2,3,4,1,2,3,4,1,2]` right? Thats `[1,2,3,4]*2` + `[1,2]`. the `*2` here is the `*n` in the code I posted, and the `[1,2]` is the `[1,2,3,4][:2]` in the code above. – rafaelc Aug 08 '18 at 03:24
  • There are many ways to save this to `csv`. You can just do `df.sort_values(by='group')` and have a sorted DF. You don't need groupby here. You can also split the `df` and save 4 different .csvs for each group. – rafaelc Aug 08 '18 at 03:25
  • Oh ok! I get it. Thank you for the simple explanation. I want to save each df into 4 different csv. That's why i thought of using groupby() – Lily Aug 08 '18 at 03:44