I am trying to organise DataFrame columns based on the specific rules, but I don't know the way.
For example, I have a DataFrame related to chemistry as shown below. Each row shows the number of chemical bonds in a chemical compound.
OH HO CaO OCa OO NaMg MgNa
0 2 3 2 0 1 1 1
1 0 2 3 4 5 2 0
2 1 2 3 0 0 0 0
In chemistry, OH (Oxygen-Hydrogen) bond is equal to HO (Hydrogen-Oxygen) bond and CaO (Calcium-Oxygen) bond is equal to OCa (Oxygen-Calcium) bond in the meaning. Thus, I'd like to organise the DataFrame as shown below.
OH CaO OO NaMg
0 5 2 1 2
1 2 7 9 2
2 3 3 0 0
Iām struggling because:
- there are a variety of chemical bonds in my real DataFrame, so it is impossible to organise the information one by one (The number of columns is more than 3,000 and I don't know which kinds of chemical bonds exist and are duplicates.)
- the number of letters depends on each element symbol and some symbols include lowercase (e.g. Hydrogen: H (one letter and only uppercase), Calcium: Ca (Two letters and uppercase & lowercase)
I looked for the same question online and wrote codes by myself, but I was not able to find the way. I would like to know the codes which solve my problem.