Least-squares-fit overlays


3/5: Optimal-superposition-via-quaternions

Optimal superposition of one set of atomic coordinates onto another is best achieved using quaternion algebra. Quaternions have the advantage of rapid computation and immunity from 'gimbal lock', a phenomemon affecting rotation matrices when axes that should be independent happen to become aligned. I first encountered quaternions in the mid-1990s while writing a Fortran program to align protein-crystal structures (program ProtRot) using an elegant implementation by Kearsley (1989). The same approach is used here in the program over-rot.py.

Over-rot.py finds all files in the working directory with extension .xyz and obtains the optimal translation and rotation to superimpose the molecules. In our case, these are metconazole-A.xyz and metconazole-B.xyz, but the method is not limited to just two structures. In later examples we'll see four, eight, and seventeen components. For any superposition, over-rot.py takes the first .xyz file it finds as the 'reference' and then calculates alignment parameters (translation and rotation) for all the other .xyz files present.

Let's start with a recap of what these .xyz files look like:

22  
metconazole A  
Cl1A     11.975713     6.839304     6.630064  
O1A       7.326818     0.526737    10.347148  
.  
. lines omitted 
.  
C16A     11.490834     5.322523     8.825329  
C17A     10.737900     4.418826     9.558406 
22  
metconazole B  
Cl1B     15.834993     2.554754     4.061488  
O1B      16.832833     8.828427     9.962936  
.  
. lines omitted 
.  
C16B     14.925302     3.922900     6.206208  
C17B     15.071688     4.842193     7.238759 

The snippets above show that coordinates of the two molecules differ substantially - corresponding atoms are not aligned. To perform a meaningful superposition, over-rot.py needs to know which atoms (or subset of atoms) to use for the fit. There are three ways to provide this information:

For convenience, this information may be in a file 'over-rot-atoms.txt' in the working directory, but if this file is absent, over-rot.py prompts the user interactively via the command line. Here's an example of a successful run of over-rot.py :

Using reference structure: metconazole-A.xyz

No over-rot-atoms.txt found. Please specify which atoms to fit:
Options:
  - 'all' to use all atoms
  - A number (e.g., n = 12) to use first n atoms
  - Atom list (e.g., C1 C2 N1 N2 N3 O1 S1 etc.)
Atom specification: 
Using ALL atoms

=== Pre-alignment Analysis ===

Targeted Atoms RMSD Matrix (Å):
                                 1                   2         
----------------------------------------------------------------------------
 1 metconazole-A.xyz               0.000               9.472
 2 metconazole-B.xyz               9.472               0.000

Global RMSD (pre-fitted atoms): 2.7342 Å

=== Alignment ===
Aligned metconazole-B.xyz to reference | Fit RMSD: 0.1355 Å | Rotation: 177.34° 

Rotation Angles:
Index  Filename                         Rotation Angle (°)  Inverted
------------------------------------------------------------
  1  metconazole-A.xyz                            0.00  No
  2  metconazole-B.xyz                          177.34  No

All aligned structures written to over-rot.rot

=== Post-alignment Analysis ===

Rotated Atoms RMSD Matrix (Å):
                                 1                   2         
----------------------------------------------------------------------------
 1 metconazole-A.xyz               0.000               0.135
 2 metconazole-B.xyz               0.135               0.000

Global RMSD (rotated atoms): 0.0391 Å
Improvement: 2.6951 Å 

In the above we specified 'all' atoms, but feel free to experiment. The program calculates the RMSD before fitting, performs the quaternion-based alignment, and then gives post-alignment statistics. It constructs a square matrix of RMSDs relative to the reference structure (here there are only two structures, so there's not much to see, later examples are more impressive). It then gives a global RMSD relative to the average of the structures present, defined by: \[\text{Global RMSD} = \sqrt{\frac{1}{N \cdot M} \sum_{i=1}^{N} \sum_{j=1}^{M} \left( x_{i,j} - \bar{x}_j \right)^2 }\] in which:

Over-rot.py also checks inverted coordinates for each non-reference structure. If the RMSD is smaller for the inverted coordinates, it asks whether to use the inverted form instead. This stuff is also written to the log file. It would be useful for non-Sohncke space groups that have molecules related by a symmetry operation of the second kind. The program also writes the overall rotation experienced by each fragment.

The last thing that over-rot.py does is to create a combined file with all the translated/rotated coordinates concatenated in a file named 'over-rot.rot' (which can of course be renamed). Here's a snippet of the .rot file that shows the first two atoms from each molecule after fitting:

22  
metconazole A  
Cl1A     11.975713     6.839304     6.630064  
O1A       7.326818     0.526737    10.347148  
.  
. lines omitted 
.  
22  
metconazole-B  
Cl1B     12.089721     6.795058     6.779809  
O1B       7.311501     0.549803    10.433183      
.  
. lines omitted 
.  

As you can see, coordinates for corresponding atoms are now much closer. The over-rot.rot file has individual aligned structures together in .xyz format. We'll use it to create an html/JavaScript webpage to view the overlayed structures in interactive 3D, from which high-resolution PNG images can be saved.





Return to the main Tutorials page or to the main X-Ray Lab page