0

I have got some data(42 features) collected from people during some months(maximum - 6; varies for different entries), every month's value is represented in its own row:

enter image description here

There are 9267 unique ID values(set as index) and as many as 50 000 rows in the df. I want to convert it to 42 * 6 feature vectors for each ID(even though some will have a lot of NaNs there), so that i can train on them, here is how it should look like:

enter image description here

Here is my solution:

def flatten_features(f_matrix, ID):
    '''constructs a 1x(6*n) vector from  6xn matrix'''
    #check wether it is a series, not dataframe
    if(len(f_matrix.shape) == 1): 
        f_matrix['ID'] = ID
        return f_matrix

    flattened_vector = f_matrix.iloc[0]

    for i in range(1, f_matrix.shape[0]):
        vector_append = f_matrix.iloc[i]
        vector_append.index = (lambda month, series_names : series_names.map(lambda name : name + '_' + str(month)))\
                                (i, vector_append.index)
        flattened_vector = flattened_vector.append(vector_append)

    flattened_vector['ID'] = ID
    return flattened_vector


#construct dataframe of flattened vectors for numerical features
new_indices = flatten_features(numerical_f.iloc[:6], 1).index
new_indices

flattened_num_f = pd.DataFrame(columns=new_indices)
flattened_num_f

for label in numerical_f.index.unique():

    matr = numerical_f.loc[label]
    flattened_num_f = flattened_num_f.append(flatten_features(matr, label))

It yields needed results, however it runs very slow. I wonder, is there a more elegant and fast solution?

TheSmokingGnu
  • 302
  • 2
  • 6
  • 15
  • It is totally unclear to me what your desired output is. Can you give and example of your input that **is not an image** and a desired output? – juanpa.arrivillaga Oct 10 '17 at 17:43
  • @juanpa.arrivillaga how should i show the huge df that i have on input, if not by the means of jupyter notebook representation? – TheSmokingGnu Oct 10 '17 at 18:18
  • You provide a [mcve] as is required. A image is useless. See [this question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) regarding how to create a good, reproducible `pandas` example. – juanpa.arrivillaga Oct 10 '17 at 18:21
  • @TheSmokingGnu what is the identifier for months in the first table? – akilat90 Oct 11 '17 at 05:27

1 Answers1

0

if you want to transpose df, you could cam T function. I assume you have id stored in unique_id variable

new_f = numerical_f.T
new_f.columns = unique_id
galaxyan
  • 5,944
  • 2
  • 19
  • 43