Construct one row for each Index value from multiple by appending

Question

I have got some data(42 features) collected from people during some months(maximum - 6; varies for different entries), every month's value is represented in its own row:

There are 9267 unique ID values(set as index) and as many as 50 000 rows in the df. I want to convert it to 42 * 6 feature vectors for each ID(even though some will have a lot of NaNs there), so that i can train on them, here is how it should look like:

Here is my solution:

def flatten_features(f_matrix, ID):
    '''constructs a 1x(6*n) vector from  6xn matrix'''
    #check wether it is a series, not dataframe
    if(len(f_matrix.shape) == 1): 
        f_matrix['ID'] = ID
        return f_matrix

    flattened_vector = f_matrix.iloc[0]

    for i in range(1, f_matrix.shape[0]):
        vector_append = f_matrix.iloc[i]
        vector_append.index = (lambda month, series_names : series_names.map(lambda name : name + '_' + str(month)))\
                                (i, vector_append.index)
        flattened_vector = flattened_vector.append(vector_append)

    flattened_vector['ID'] = ID
    return flattened_vector


#construct dataframe of flattened vectors for numerical features
new_indices = flatten_features(numerical_f.iloc[:6], 1).index
new_indices

flattened_num_f = pd.DataFrame(columns=new_indices)
flattened_num_f

for label in numerical_f.index.unique():

    matr = numerical_f.loc[label]
    flattened_num_f = flattened_num_f.append(flatten_features(matr, label))

It yields needed results, however it runs very slow. I wonder, is there a more elegant and fast solution?

It is totally unclear to me what your desired output is. Can you give and example of your input that **is not an image** and a desired output? — juanpa.arrivillaga, Oct 10 '17 at 17:43
@juanpa.arrivillaga how should i show the huge df that i have on input, if not by the means of jupyter notebook representation? — TheSmokingGnu, Oct 10 '17 at 18:18
You provide a [mcve] as is required. A image is useless. See [this question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) regarding how to create a good, reproducible `pandas` example. — juanpa.arrivillaga, Oct 10 '17 at 18:21
@TheSmokingGnu what is the identifier for months in the first table? — akilat90, Oct 11 '17 at 05:27

score 0 · Answer 1 · answered Oct 10 '17 at 18:09

0

if you want to transpose df, you could cam T function. I assume you have id stored in unique_id variable

new_f = numerical_f.T
new_f.columns = unique_id

answered Oct 10 '17 at 18:09

galaxyan

5,944
2
19
43

But the transposed matrix would have ~50 000 columns, and your second line just substitutes them with ~9000 lines, which results in error – TheSmokingGnu Oct 10 '17 at 18:20
@TheSmokingGnu do you want aggregate all the same id together? – galaxyan Oct 10 '17 at 18:23
yes, 1 42*6 dimentional row for each unique ID, containing values from up to six rows for this IDs(all that exist) – TheSmokingGnu Oct 10 '17 at 18:30
@TheSmokingGnu are there 6 rows for each id? – galaxyan Oct 10 '17 at 18:47

Construct one row for each Index value from multiple by appending

1 Answers1