Let's consider the following example code,
pre_process.py
import pandas as pd
from sklearn.preprocessing import LabelBinarizer
class PreProcess(object):
def __init__(self):
... .... ....
... .... ....
C: def fit_clms(self, lb_style, dataset, style_clms = ['A', 'B']):
B: lb_results = lb_style.fit_transform(dataset[style_clms]) # exp. result is, "dataset['X', 'Y']", but it became to "dataset[['X', 'Y']]", pl note the nested list
# (**Worked - by this line**) lb_results = lb_style.fit_transform(dataset['A', 'B', 'C'])
print(lb_results)
if lb_style.classes_.shape[0] > 0:
... .... ....
... .... ....
def process_chunks(self, chunks):
lb_style = LabelBinarizer()
print('------------------------------------------------\n')
count = 0
for dataset in chunks:
count += 1
print ('Processing the Chunk %d ...' % count)
# Group By
dataset['Grouping_MS'] = dataset[['_time', 'source']].apply(self.group_by_clm, axis=1)
A: dataset = self.fit_clms(lb_style, dataset, ['X', 'Y'])
... .... ....
... .... ....
def init(self):
Times.start()
# Read the Source File
chunks = self.read_csv_file(SOURCE_FILE, CHUNK_SIZE)
self.process_chunks(chunks)
... .... ....
... .... ....
Here, how to pass a list ['A', 'B'] (A:)
, and access it at "dataset[style_clms]
" (B:)
? (now it becomes to [['X', 'Y']]
, but i want ['X', 'Y'], i.e became to nested list)
Also, is it good to set a list as a "default" parameter (C:)
in function definition? If not so, then any alt. ways to achieve this?
Because of Pylint, gives a warning like "Dangerous default value [] as argument"
Any ideas? Thanks,