0

I am cleaning up some data and the raw dataset has entries as ['Field1.1', 'Field2.1', 'Field1.2', 'Field2.2']. For the dataset, either 'Field1' XOR 'Field2' will have a non-empty string. I'd like to create a single field 'Field.1' that will extract the non-empty string from 'Field1.1' XOR 'Field2.1' and place it in 'Field.1'. Similarly, I'd like to do this for 'Field1.2' and 'Field2.2' as 'Field.2'.

I am not sure how to select matching fields, i.e. 'X.1' with 'Y.1' and 'X.2' with 'Y.2', in order to do this.


My logic is that once I can select the correct pairs I can simply use a concat statement to add them and thereby extract the non-empty string for later use. If this logic is incorrect or there is a better way that does not rely on extracting the non-empty string in this way to concatenate them then please let me know.


If the logic is sound, please explain how might do this extraction given the indexing problem.


To be clearer, I want to go from:

df = pd.DataFrame({'field1.1': ['string1',''], 'field2.1':['','string2'],
                   'field1.2': ['string3',''], 'field2.2':['','string4']})

df

Out[1]: 
  field1.1 field2.1 field1.2 field2.2
0  string1           string2         
1           string3           string4

to:

df2 = pd.DataFrame({'field.1': ['string1','string3'], 'field.2':['string2','string4']})

df2

Out[2]: 
   field.1  field.2
0  string1  string2
1  string3  string4
bradchattergoon
  • 452
  • 4
  • 14

1 Answers1

1

You can use wide_to_long, bfill, and then pivot back:

(pd.wide_to_long(df.where(df.ne('')).reset_index(), 
                 stubnames=['Field1','Field2'],
                 i='index', 
                 j='group', 
                 sep='.')
 .bfill(1)
 .reset_index()
 .pivot(values='Field1',index='index',columns='group')
)

Sample data:

df = pd.DataFrame([
    ['a','','b',''],
    ['c','','','d'],
    ['','e','','f'],
    ['','g','h','']],
    columns=['Field1.1', 'Field2.1', 'Field1.2', 'Field2.2'])

group  1  2
index      
0      a  b
1      c  d
2      e  f
3      g  h
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74