I need a function that given a data frame and a number num
constructs a data frame with num
rows such that every row has the following value:
- for columns with string values we sample a value from a column in original table
- for columns with floats or ints we find mean value
Here is my code
def rows_aggr(df, num):
dataframe = None
for i in range(0, num):
row = None
for cname in df.columns.values:
column = df[cname]
dfcol = Series.to_frame(column)
if column.dtype != np.number:
item = dfcol.sample(n=1)
else:
item = dfcol.mean(axis=1)
if row is None:
row = item
else:
row = pd.concat([row, item], axis=1)
if dataframe is None:
dataframe = row
else:
dataframe = pd.concat([dataframe, row], axis=0)
return dataframe
for some reason rows contain nan values and exceed the num
... and this code does not seem to work right. If you know a better way accomplishing what I need - I would be happy to know.
for
df = pd.DataFrame({'col1':list('abcdef'),'col2':range(6)}) and num=3
we would get smth like
c, 2.5
f, 2.5
b, 2.5
assuming and c, f, b
were randomly picked
Thank you!