I am cleaning up some data and the raw dataset has entries as ['Field1.1', 'Field2.1', 'Field1.2', 'Field2.2']
. For the dataset, either 'Field1' XOR 'Field2'
will have a non-empty string. I'd like to create a single field 'Field.1'
that will extract the non-empty string from 'Field1.1'
XOR 'Field2.1'
and place it in 'Field.1'
. Similarly, I'd like to do this for 'Field1.2'
and 'Field2.2'
as 'Field.2'
.
I am not sure how to select matching fields, i.e. 'X.1' with 'Y.1' and 'X.2' with 'Y.2', in order to do this.
My logic is that once I can select the correct pairs I can simply use a concat statement to add them and thereby extract the non-empty string for later use. If this logic is incorrect or there is a better way that does not rely on extracting the non-empty string in this way to concatenate them then please let me know.
If the logic is sound, please explain how might do this extraction given the indexing problem.
To be clearer, I want to go from:
df = pd.DataFrame({'field1.1': ['string1',''], 'field2.1':['','string2'],
'field1.2': ['string3',''], 'field2.2':['','string4']})
df
Out[1]:
field1.1 field2.1 field1.2 field2.2
0 string1 string2
1 string3 string4
to:
df2 = pd.DataFrame({'field.1': ['string1','string3'], 'field.2':['string2','string4']})
df2
Out[2]:
field.1 field.2
0 string1 string2
1 string3 string4