2

I'd like to have your help on the following issue:

I have run two psychology experiments and I have retrieved reaction times from my subjects. All of this data is submitted to various .xlsx files in dataframes (one file per experiment) and with Python I created a single .xlsx file that contains all of them (through .append) The index of my master file is the subject ID which starts from 1.

The problem that I have now is that I have 12 subjects, and Python does not seem to follow the natural sorting rules...What I have is that subject 10, 11, 12 come before subject 1 - 9 (which messes up the way I retrieve my data). I have used natsort in such a way:

MASTER_DF_FLAT.index = natsorted(MASTER_DF_FLAT.index)
MASTER_DF_FV.index = natsorted(MASTER_DF_FV.index)

This fixes the problem with the indexes, but the data on the rows won't follow the new order (on excel you can "extend the selection" and you get away with it). Any idea how I can fix this?

EDIT: here is how the data is formatted with 10+ subjects:

subject block   trial   condition   RT
10  1   1   1   0,677044666
10  1   2   12  0,556960872
10  1   3   2   0,51685685
10  1   4   5   0,766190078

And here is the data from subject 1:

subject block   trial   condition   RT
1   1   1   1   0,677044666
1   1   2   12  0,556960872
1   1   3   2   0,51685685
1   1   4   5   0,766190078

As you can see the RTs have not changed (and so did not the other values)

Thanks for your help!

EDIT1: here is how i save my files:

I do that :) Here is my code exactly:

for f in files:
    data = os.path.join(data_dir,f)
    df = pd.read_excel(data)

    df.set_index('subject', inplace = True)


concatenated = pd.concat([block, trial, condition, RT, SOA, acc, ISI], axis=1)

    if 'FV' not in f:
        flatDFs.append(concatenated)
    else:
        fvDFs.append(concatenated)


MASTER_DF_FLAT = pd.concat(flatDFs)
writerFlat = pd.ExcelWriter('MASTER_FILE_FLAT.xlsx')
MASTER_DF_FLAT.to_excel(writerFlat,'Sheet1')
writerFlat.save()

As you can see, the index in my .xlsx file is indeed subject. Everything works fine, it's just the order of elements!

AndrewDAG
  • 65
  • 1
  • 2
  • 12
  • Please show sample data. It sounds like your data is being read as strings rather than numbers. Incidentally, often you will be better off using a plaintext format like CSV, rather than Excel. – BrenBarn Nov 16 '16 at 21:30
  • Done. If you need more info tell me :) – AndrewDAG Nov 16 '16 at 21:44
  • That is odd, as long as your index is integers, pandas should sort correctly. Have you tried [`sort_index`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html)? – Kartik Nov 16 '16 at 21:47
  • My indexes are definitely integers. I did try sort_index, but unsuccessfully. May I ask you how you'd use it? – AndrewDAG Nov 16 '16 at 21:55
  • Wait a minute, `'subject'` is not your index. – Kartik Nov 16 '16 at 22:02
  • It is in my xlsx file. – AndrewDAG Nov 16 '16 at 22:04
  • @AndreaD'Agostino: There isn't any such thing as an index in an Excel file. – BrenBarn Nov 17 '16 at 03:05
  • If you save your dataframe to an .xlsx file, then the index of your dataframe will become the "index" of the excel file (basically the first column, which now has 10 - 12 coming before 1 - 9). I know there isn't any index as python in excel, but give it a try. – AndrewDAG Nov 17 '16 at 08:23
  • What are the datatypes of the numeric columns? Are they strings or floats/ints? – SethMMorton Nov 18 '16 at 02:46
  • Also, have you seen [this answer about using natsort with pandas](http://stackoverflow.com/a/29582718/1399279)? – SethMMorton Nov 18 '16 at 02:56

1 Answers1

0

I realized that 'subject' is not your index. So this should fix your problem:

df.set_index('subject', inplace=True)
df.sort_index(inplace=True)
Kartik
  • 8,347
  • 39
  • 73
  • 'subject' is already my index. I save my dataframe to a new xlsx file when creating the master file with the code `writerFlat = pd.ExcelWriter('MASTER_FILE_FLAT.xlsx')` `writerFlat.save()` In excel the dataframe index is simply the first column. – AndrewDAG Nov 16 '16 at 22:21
  • Run my answer before saving your excel file. Your work flow is to read individual subject files, merge them to a single dataframe, set the subject number as index, sort on index, then save to the master excel file. – Kartik Nov 17 '16 at 00:21