shift columns one step if column value is empty

Question

Suppose I have a dataframe looking something like this:

  col1 col2 col3 col4
0    A    B    F    O
1    A         G    Q
2    A    C    G    P
3    A         H     
4    A    D    I     
5    A    D    I     
6    A         J    U
7    A    E         J

How can I shift the columns if the column value is empty?

  col1 col2  col3  col4
0    A    B     F     O
1    A    G     Q 
2    A    C     G     P
3    A    H  
4    A    D     I 
5    A    D     I 
6    A    J     U 
7    A    E     J

I thought I could check current column, if it's empty, take the next column value and make that empty.

for col in df.columns:    
    df[col] = np.where((df[col] == ''), df[f'col{int(col[-1])+1}'], df[col])
    df[f'col{int(col[-1])+1}'] = np.where((df[col] == ''), '', df[col])

But I am failing somewhere. Sample df below.

df = pd.DataFrame(
    {
        'col1': ['A','A','A','A','A','A','A','A'],
        'col2': ['B','','C','','D','D','','E'],
        'col3': ['F','G','G','H','I','I','J',''],
        'col4': ['O','Q','P','','','','U','J']
    }
)

Vishnudev Krishnadas · Answer 1 · 2021-03-10T15:27:07.463

3

Replace empty string with NaN

df = df.replace('', np.nan)

Apply dropna row-wise

odf = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)

To retain column names,

odf.columns = df.columns

NOTE: It is always good to represent missing data with NaN

Output

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q  NaN
2    A    C    G    P
3    A    H  NaN  NaN
4    A    D    I  NaN
5    A    D    I  NaN
6    A    J    U  NaN
7    A    E    J  NaN

edited Mar 10 '21 at 15:27

answered Mar 10 '21 at 15:16

Vishnudev Krishnadas

10,679
2
23
55

1

This answer is shear Pythonic elegance. What a fantastic line of code! There was a small issue with `columns` to `columns` but its trivial and like compared with for example https://stackoverflow.com/questions/72104599/how-to-delete-empty-spaces-from-pandas-dataframe-rows-until-first-populated-fiel ... think you can see the point. Summary, this post needs upvoting alot. This coder is good. – M__ Mar 01 '23 at 14:42

score 2 · Accepted Answer · answered Mar 10 '21 at 15:10

2

One way is to use np.argsort:

s = df.to_numpy()
orders = np.argsort(s=='', axis=1, kind='mergesort')

df[:] = s[np.arange(len(s))[:,None],orders]

Output:

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q     
2    A    C    G    P
3    A    H          
4    A    D    I     
5    A    D    I     
6    A    J    U     
7    A    E    J

Note: A very similar approach can be found in this question.

answered Mar 10 '21 at 15:10

Quang Hoang

146,074
10
56
74

numpy solutions also very fast – destinychoice Mar 10 '21 at 15:29

ThePyGuy · Answer 3 · 2021-03-10T15:17:41.687

You can count the number of empty values for a column, then sort it, and finally get the desired datframe..

counts = {}
for col in df.columns.to_list():
    counts[col] = (df[col]== '').sum()  #Based on the example you have provided.
# Then sort the dictionary based on counts.
counts = dict(sorted(counts.items(), key=lambda item: item[1]))
#Assign back to the dataframe.
df = df[[*counts.keys()]]
df

  col1 col3 col2 col4
0    A    F    B    O
1    A    G         Q
2    A    G    C    P
3    A    H          
4    A    I    D     
5    A    I    D     
6    A    J         U
7    A         E    J

shift columns one step if column value is empty

3 Answers3