I have a very large data file which can be modelled with the code below:
import pandas as pd
import numpy as np
import random
x = pd.DataFrame({'file_name': np.repeat(['csv_1', 'csv_2'], 32),
'side': np.tile(np.repeat(['n', 'f'], 16),2),
'ten_13':np.tile(np.repeat(['10', '13'], 8),4),
'rand_data': np.random.uniform(-1, 1, size=64)})
x = x.set_index(['file_name', 'side', 'ten_13'])
x.head()
I need your help to add a column called web_col
using pandas' groupby( ) method to the above DataFrame x
.
The web_col
column depends on the combination of values in columns side
, ten_13
and has one of four possible sequence values:
- 1:8 if the value of 'side' = 'n' and 'ten_13' = '10'
- 9:16 if the value of 'side' = 'n' and 'ten_13' = '13'
- 17:24 if the value of 'side' = 'f' and 'ten_13' = '10'
- 25:32 if the value of 'side' = 'f' and 'ten_13' = '13'
file_name
contains various csv files; in the code below it has only two files. Each file has the same exact values of side
and ten_13
.
I created this web_col
column using the numpy function np.where()
, BUT I would like to use pandas' groupby( ) instead. The np.where()
code is below:
ten_13 = x.index.get_level_values('ten_13')
side = x.index.get_level_values('side')
n_10 = ((ten_13 == '10') & (side == 'n')).reshape(-1,8)
n_13 = ((ten_13 == '13') & (side == 'n')).reshape(-1,8)
f_10 = ((ten_13 == '10') & (side == 'f')).reshape(-1,8)
f_13 = ((ten_13 == '13') & (side == 'f')).reshape(-1,8)
web_col = np.where(n_10, np.arange(1, 9, 1),
np.where(f_10, np.arange( 17, 25, 1),
np.where(n_13, np.arange(9, 17, 1),
np.where(f_13, np.arange(25,33, 1), np.arange(1, 9, 1)))))
x['web_col'] = web_col.reshape(64,1)
x.head()