I have the following strings in a pandas DataFrame in Python3, column string1
and string2
:
import pandas as pd
datainput = [
{ 'string1': 'TTTABCDABCDTTTTT', 'string2': 'ABABABABABABABAA' },
{ 'string1': 'AAAAAAAA', 'string2': 'TTAAAATT' },
{ 'string1': 'TTABCDTTTTT', 'string2': 'ABABABABABA' }
]
df = pd.DataFrame(datainput)
df
string1 string2
0 TTTABCDABCDTTTTT ABABABABABABABAA
1 AAAAAAAA TTAAAATT
2 TTABCDTTTTT ABABABABABA
For each row, strings in columns string1
and string2
are defined to be the same length.
For each row of the DataFrame, the strings may need to be "cleaned" of beginning/trailing letters 'T'. However, for each row, the strings need to both be stripped of the same number of characters, so as the strings remain the same length.
The correct output is as follows:
df
string1 string2
0 ABCDABCD BABABABA
1 AAAA AAAA
2 ABCD ABAB
If these were two variables, it would be straightforward to calculate this with strip()
, e.g.
string1 = "TTTABCDABCDTTTTT"
string2 = "ABABABABABABABAA"
length_original = len(string1)
num_left_chars = len(string1) - len(string1.lstrip('T'))
num_right_chars = len(string1.rstrip('T'))
edited = string1[num_left_chars:num_right_chars]
## print(edited)
## 'ABCDABCD'
However, in this case, one needs to iterate through all rows and redefine two rows at once. How could one modify each these strings row by row?
EDIT: My main confusion is, given both columns could T
, how do I re-define them both?