1

I have the following dataframe:

df = pd.DataFrame({
   'tmp': ['A', 'A', 'B', 'Z', 'D', 'C'],
   'F1': [2, 1, 9, 8, 7, 4],
   'F20': [0, 1, 9, 4, 2, 3],
   'F3': ['a', 'B', 'c', 'D', 'e', 'F'],
   'aabb': ['a', 'B', 'c', 'D', 'e', 'F']
})
---
  tmp  F1  F20 F3 aabb
0   A   2    0  a    a
1   A   1    1  B    B
2   B   9    9  c    c
3   Z   8    4  D    D
4   D   7    2  e    e
5   C   4    3  F    F

and I would like to sort only the columns with the F in this way:

   tmp  F1  F3  F20 aabb
0   A   2    a   0    a
1   A   1    B   1    B
2   B   9    c   9    c
3   Z   8    D   4    D
4   D   7    e   2    e
5   C   4    F   3    F

How could I do?

(edit) The columns with the "F" can vary both in quantity and in the values that follow the F (in my case I have about 100 columns like those) The columns with F are always grouped but the number before and after is variable

mozway
  • 194,879
  • 13
  • 39
  • 75
Mario
  • 89
  • 6
  • Are the F columns always grouped? Is the number of columns before/after variable? – mozway Sep 02 '22 at 17:15
  • Yes, the columns with F are always grouped but the number before and after is variable – Mario Sep 02 '22 at 17:16
  • I asked similar question but It got closed without considering it was just similar not exactly the same! my question is: So what if we do not know the names of columns ( ex. it is going to be part of a function which takes any dataframe) and we want to make sure whatever the names are, they are ordered. – Mathica Oct 12 '22 at 18:19

3 Answers3

2

You can simply do:

df = df[["tmp", "F1", "F3", "F20", "aabb"]]

Update: sort the F columns.

# find all F columns
f_cols = df.columns[df.columns.str.startswith("F")].tolist()
# ["F1", "F20", "F3"]

# sort them using a lambda key
f_cols_sorted = sorted(f_cols, key=lambda s: int(s[1:]))
# ['F1', 'F3', 'F20']

# sort
df = df[["tmp"] + f_cols_sorted + ["aabb"]]

Update: arbitrary column name/number of columns before/after the F group.

# find cols of pattern `F\d+`
m = [re.match(r"F\d+", s) is not None for s in cols]
# -> [False, True, True, True, False]

# find first/last index of F column
first_f = m.index(True)                    # 1
last_f = len(m) - m[::-1].index(True) - 1  # 3

# sort column names
sorted_cols = cols[:first_f] + sorted(cols[first_f:last_f+1], key=lambda s: int(s[1:])) + cols[last_f+1:]
# -> ['tmp', 'F1', 'F3', 'F20', 'aabb']

# finally
df = df[sorted_cols]
TYZ
  • 8,466
  • 5
  • 29
  • 60
2

You can use natsort for natural sorting and a mask to handle only the F columns:

# pip install natsort
from natsort import natsorted

cols = df.columns.to_numpy(copy=True)
m = df.columns.str.fullmatch('F\d+')
cols[m] = natsorted(cols[m])

df_sorted = df[cols]

Alternative without natsort:

num = df.columns.str.extract('F(\d+)', expand=False).astype(float)
cols = df.columns.to_numpy(copy=True)
m = num.notna()
order = np.argsort(num[m])
cols[m] = cols[m][order]

df_sorted = df[cols]

output:

  tmp  F1 F3  F20 aabb
0   A   2  a    0    a
1   A   1  B    1    B
2   B   9  c    9    c
3   Z   8  D    4    D
4   D   7  e    2    e
5   C   4  F    3    F
mozway
  • 194,879
  • 13
  • 39
  • 75
0

This solves the edit issue:

df=df[["tmp"]+[f"F{x}" for x in sorted([int(y[1:]) for y in  df.columns if y[0]=="F"]) ]+["aabb"]]

Output:

  tmp  F1 F3  F20 aabb
0   A   2  a    0    a
1   A   1  B    1    B
2   B   9  c    9    c
3   Z   8  D    4    D
4   D   7  e    2    e
5   C   4  F    3    F
  • Thank you! But in this case, as in the previous answer, there is the problem of variable columns before and after those with F – Mario Sep 02 '22 at 17:26