python - creating a join with the optional ignoring of the first character

Question

I have 2 data sets that I need to join via the asset id. I have columns in each table that can be used for the join but the excel docs are from different systems and therefore the formats are different.

My problem is dataset one will remove the zeros at the front of the dataset (see table below). These both relate to the same asset it's just the output from one system that removes the zeros at the front of the ID.

Asset	dataset 1	dataset 2
A	012345	12345
B	001235	1235
C	0891011	891011

I am currently creating a new column that removes the front values of the ID if they are "0" to match the datasets then doing the join by using lstrip().

I am wondering if there is a more efficient way of doing this within the join function.

Edit/Answer: It appears the way I have been doing it is the most efficient thanks for your help.

Which `join` function? What library are you using, Pandas? Please tag accordingly, and preferably add minimal code to reproduce and test. — tobias_k, Jun 29 '22 at 11:35
I think there's nothing wrong with `str.lstrip`ping the leading "0"s. — fsimonjetz, Jun 29 '22 at 12:17
Seeing how IDs can have different numbers of digits, a better way than joining the columns with stripped zeros might be to _not_ strip the zeros for dataset 2 in the first place. What if there is one ID `00123` and another `0123`? — tobias_k, Jun 29 '22 at 12:55
I have no power over the outputs of the 2 software removing the leading 0. we have already raised this with the appropriate people in my company. The question was asking for a join or function that combines this process and the merge hence I didnt add any code. I initially thought of adding the 0 back until I noticed there are several Ids with two 0 in front. I have already been using lstrip at the moment to recreate the columns for the merge without leading zeros. I have checked most of the IDs manually and the lstrip() appears to be working. Curious if there was another method. — christopher babey, Jun 30 '22 at 13:15
@christopherbabey you can add leading zeros, too, e.g., pad up to a length of 6 via `df['dataset 2'].apply(format, args=['06d'])` if the dtype of the column is `int` or `df['dataset 2'].astype(str).str.zfill(6)` if it's `str`/`object`, but just stripping the extra zeros really seems to be the most straightforward solution. — fsimonjetz, Jun 30 '22 at 16:40

score 1 · Accepted Answer · answered Jun 29 '22 at 11:52

1

As I understand you want to remove leading zeros. See this discussion.

Example

data = "01234564500"

print(data.lstrip("0"))
print(data.rstrip("0"))

Output

1234564500
012345645

answered Jun 29 '22 at 11:52

bekirbakar

166
2
7

python - creating a join with the optional ignoring of the first character

1 Answers1