1

I have a dataframe containing marker positions in xyz, for 24 markers, in every frame of a 10 second (100Hz) timeseries. Like this:

    LHEE_x   LHEE_y  LHEE_z   LTOE_x   LTOE_y  LTOE_z   RHEE_x   RHEE_y 
0 -347.013  526.914  27.256 -138.267  578.015  39.859 -352.402  297.541   
1 -347.035  526.934  27.261 -138.271  578.074  39.874 -352.409  297.548   
2 -347.041  526.933  27.265 -138.260  578.073  39.898 -352.402  297.533   
3 -347.030  526.928  27.251 -138.242  578.079  39.905 -352.427  297.535   
4 -347.029  526.939  27.222 -138.244  578.072  39.915 -352.445  297.525 
.....

I want to extract columns that contain positions from one marker at a time (i.e. LHEE_x, LHEE_y and LHEE_z) to new dataframes, and preferrably give the new dataframes the name of the marker (i.e. LHEE).

I know I can do this "manually" by

LHEE=pos_xyz.iloc[:,0:3]
or    
LHEE=pos_xyz['LHEE_x','LHEE_y','LHEE_z']

but since I have a ton of markers this would be a hassle..

Also I would want to make a function that does this so I can use it on dataframes containing marker positions from different markers.

What would be the most efficient way to do this? This question How to select DataFrame columns based on partial matching? seemed promising, but I need the result to be a dataframe. I've tried using loops and grouping, but I can't seem to find a way that works properly.

E. V.
  • 43
  • 1
  • 7
  • 1
    Providing readers with a fully reproducible example would maximize your chance to get an answer. E.g. `import pandas` and provide a code which allows us to directly play with the dataframe (df), instead of obliging us to start with a copied-pasted string version of your df. – keepAlive Jul 27 '17 at 11:04
  • I'm sorry, I don't know how to do what you suggest, first time posting a question. I'd be happy to provide the df :) – E. V. Jul 27 '17 at 11:32
  • No problem. Just remember this for the next time. BTW, jezrael's answer is very likely to be what you want. Please, [could you remove your signature ? Your "Cheers, Elise"](https://meta.stackexchange.com/questions/2950/should-hi-thanks-taglines-and-salutations-be-removed-from-posts). – keepAlive Jul 27 '17 at 11:34

1 Answers1

2

I think you need create Multiindex by split column names first:

df.columns = df.columns.str.split('_', expand=True)
print (df)
      LHEE                      LTOE                      RHEE         
         x        y       z        x        y       z        x        y
0 -347.013  526.914  27.256 -138.267  578.015  39.859 -352.402  297.541
1 -347.035  526.934  27.261 -138.271  578.074  39.874 -352.409  297.548
2 -347.041  526.933  27.265 -138.260  578.073  39.898 -352.402  297.533
3 -347.030  526.928  27.251 -138.242  578.079  39.905 -352.427  297.535
4 -347.029  526.939  27.222 -138.244  578.072  39.915 -352.445  297.525

And then create dictionary of DataFrames by dict comprehension:

dfs = {x:df[x] for x in df.columns.levels[0]}
print (dfs)
{'RHEE':          x        y
0 -352.402  297.541
1 -352.409  297.548
2 -352.402  297.533
3 -352.427  297.535
4 -352.445  297.525, 'LTOE':          x        y       z
0 -138.267  578.015  39.859
1 -138.271  578.074  39.874
2 -138.260  578.073  39.898
3 -138.242  578.079  39.905
4 -138.244  578.072  39.915, 'LHEE':          x        y       z
0 -347.013  526.914  27.256
1 -347.035  526.934  27.261
2 -347.041  526.933  27.265
3 -347.030  526.928  27.251
4 -347.029  526.939  27.222}

print (dfs['LHEE'])
         x        y       z
0 -347.013  526.914  27.256
1 -347.035  526.934  27.261
2 -347.041  526.933  27.265
3 -347.030  526.928  27.251
4 -347.029  526.939  27.222
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252