Select two pandas columns that have the same digit in the column name

Question

I am cleaning up some data and the raw dataset has entries as ['Field1.1', 'Field2.1', 'Field1.2', 'Field2.2']. For the dataset, either 'Field1' XOR 'Field2' will have a non-empty string. I'd like to create a single field 'Field.1' that will extract the non-empty string from 'Field1.1' XOR 'Field2.1' and place it in 'Field.1'. Similarly, I'd like to do this for 'Field1.2' and 'Field2.2' as 'Field.2'.

I am not sure how to select matching fields, i.e. 'X.1' with 'Y.1' and 'X.2' with 'Y.2', in order to do this.

My logic is that once I can select the correct pairs I can simply use a concat statement to add them and thereby extract the non-empty string for later use. If this logic is incorrect or there is a better way that does not rely on extracting the non-empty string in this way to concatenate them then please let me know.

If the logic is sound, please explain how might do this extraction given the indexing problem.

To be clearer, I want to go from:

df = pd.DataFrame({'field1.1': ['string1',''], 'field2.1':['','string2'],
                   'field1.2': ['string3',''], 'field2.2':['','string4']})

df

Out[1]: 
  field1.1 field2.1 field1.2 field2.2
0  string1           string2         
1           string3           string4

to:

df2 = pd.DataFrame({'field.1': ['string1','string3'], 'field.2':['string2','string4']})

df2

Out[2]: 
   field.1  field.2
0  string1  string2
1  string3  string4

Or at least, provide a sample data with expected output. See https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Valentino, Sep 23 '19 at 16:18

score 1 · Accepted Answer · answered Sep 23 '19 at 16:27

You can use wide_to_long, bfill, and then pivot back:

(pd.wide_to_long(df.where(df.ne('')).reset_index(), 
                 stubnames=['Field1','Field2'],
                 i='index', 
                 j='group', 
                 sep='.')
 .bfill(1)
 .reset_index()
 .pivot(values='Field1',index='index',columns='group')
)

Sample data:

df = pd.DataFrame([
    ['a','','b',''],
    ['c','','','d'],
    ['','e','','f'],
    ['','g','h','']],
    columns=['Field1.1', 'Field2.1', 'Field1.2', 'Field2.2'])

group  1  2
index      
0      a  b
1      c  d
2      e  f
3      g  h

Select two pandas columns that have the same digit in the column name

1 Answers1