0

When I import my data using pd.read_csv() from designated folder where all files are ordered in alphabetical order (ex. A1, A2, A3, A4, ..., A12, B1, B2, ..., B12), the rows created in dataframe are in order of (A1, A10, A11, A12, A2, A3, A4, A5, A6,...,A9, B1, B10, B11, B12, B2, ..., B9). So A10, A11, A12 come before A2. Do you know how to import the data in an order I want?

It looks like this:

data = {
   'Order':['A1', 'A10', 'A11', 'A12', 'A2', 'A3' ,'A4' ,'A5' ,'A6' ,'A7' ,'A8', 'A9',
               'B1', 'B10', 'B11', 'B12', 'B2', 'B3' ,'B4' ,'B5' ,'B6' ,'B7' ,'B8', 'B9']}
MNVLEY
  • 47
  • 1
  • 5
  • Does this answer your question? [Keeping columns in the specified order when using UseCols in Pandas Read\_CSV](https://stackoverflow.com/questions/40024406/keeping-columns-in-the-specified-order-when-using-usecols-in-pandas-read-csv) – dsanatomy Sep 02 '20 at 10:03

1 Answers1

5

Try splitting out the text from the numerical part of the string using str.extract - casting the numerical part to an int dtype. Then sort of both columns and use reindex to sort the DataFrame:

df = pd.DataFrame(data)

df.reindex(
    df.Order.str.extract(r'^(.*?)(\d+)$')
    .astype({1:'int'})
    .sort_values([0, 1])
    .index)

[out]

   Order
0     A1
4     A2
5     A3
6     A4
7     A5
8     A6
9     A7
10    A8
11    A9
1    A10
2    A11
3    A12
12    B1
16    B2
17    B3
18    B4
19    B5
20    B6
21    B7
22    B8
23    B9
13   B10
14   B11
15   B12
Chris Adams
  • 18,389
  • 4
  • 22
  • 39