1/6) Space group determination - XPREP

The SHELXTL software includes a very useful utility called XPREP. In this example, XPREP is used to determine the space group, and to prepare an input file to solve the structure. XPREP has a bunch of other uses, but we will ignore them for now. Much of the following is done from a command line. While this may seem archaic, once you become familiar with typing commands you will find that it is much faster and much more direct than blundering through a graphical interface. To proceed on linux or on a Mac you should have your '*.hkl' and '*.html' files ready and a terminal (e.g. xterm or similar) window open, i.e. something like this:

To start XPREP, simply type 'xprep sucrose' at the prompt and hit the <return> key. If your '*.hkl' file is not named 'sucrose', then you need to use that filename instead of 'sucrose'. The program is smart enough to look for a file with the '.hkl' extension. This will give you something like this:

To enter the cell dimensions you simply type them in. As stated above, the best estimates of the cell dimensions available from data collected on the kappaCCD are contained in the nreport.html file, which is a summary of data collection, scaling and merging statistics. Since it is an ordinary html file, it can be read with a web browser or with a text editor. To open it with (say) firefox, then type 'firefox filename.html' (again, substitute the proper filename here!). It looks something like this:

You may notice the filename given at the top of this nreport.html (k07019) is different from the filename we are using (sucrose). Renaming to 'sucrose' was just for the sake of convenience. The k07019 name follows the convention used in the x-ray lab. at UK (k = kappaCCD, 07 = the year 2007, 019 = the 19th data collection done that year). You can of course use whatever filename you like, but it helps with bookkeeping if you follow some sort of convention.

It is always best practice to keep good notes. This is especially true for novices, so write these numbers down in your notebook! Also make note of the uncertainties that are associated with each of the cell dimensions. You will need these later (vide infra). In any event, type the cell dimensions into XPREP, all on one line and separated by spaces like this:

When you hit <return> you will get the following:

One of the great things about XPREP is that it usually makes sensible choices. Nevertheless, it helps to know what is going on. The table in the window above allows you to tell the program whether the crystal has a primitive or a centered unit cell. If you don't know what that means, consult your notes or an introductory textbook.

So, how to read this table? Notice there is a column for P (primitive), as well as columns for various centering conditions (A,B,C etc.). Below these labels there are numbers that tell you how often a particular criterion is violated. Since it is always possible to fit a primitive lattice to a set of regularly-repeating points in 3D space, there should always be a bunch of zeroes in the P column. The other columns have non-zero numbers in them. If the unit cell was centered, then it would be apparent from the numbers in these columns. For example, if the lattice was really I-centered (the I means body centered, and is from the German innenzentriert), then the intensity found for any reciprocal lattice point with an odd number for h+k+l, would be small. In the table this would give small numbers in the I column for the bottom two rows.

XPREP usually makes the right choice, and gives its best guess in square brackets. Since the cell is indeed primitive, all you have to do is hit <return>. This brings up the main XPREP menu:

In this example, we will only use a small subset of the tools available in XPREP. If you feel the need to try the other features, go ahead. Some of these will be used in later tutorials for more difficult structures.

XPREP suggests the option H, which will search for a higher symmetry cell. When you hit <return>, the program responds like this:

In this example, the program suggests a cell that is identical to the one you typed in. It also gives a second option that retains the original cell. In this case, options A and B are the same. More complicated structures might give a few different possibilities for you to consider. In this case, however, the program is not able to find a higher symmetry cell, so we accept the default value by hitting <return>. This sends us back to the main menu, which now suggests the "S" option for space group determination:

Do the obvious thing, hit <return> to get this:

If you know the space group already, you could enter 'I' at the prompt. Since we know that sucrose is chiral, we could enter that here, but let's make the program do all the work! Go ahead and enter 'S', as suggested by XPREP:

To proceed, the program needs to know the crystal system. It can usually figure this out from the metrics of the cell, but there are cases where can be deceived, e.g. for a monoclinic crystal with β=90° it will suggest orthorhombic. Here it makes the correct choice of monoclinic, so accept the default value of 'M' by hitting <return>.

Here the program gives you a chance to revise your choice of centering, but it is unusual that you'd want to change this. In this case, you know it is primitive, so accept 'P' by hitting <return>, which gives:

Here we get three blocks of information:

The mean |E*E-1| statistic gives information on whether the structure is likely to be centrosymmetric or non-centrosymmetric. All it does is to compare the value calculated for the dataset (0.759) with a theoretical value for centrosymmetric (0.968) and non-centrosymmetric (0.736) intensity distributions. Here it is clear that the sucrose data are non-centrosymmetric, and this makes sense. Sucrose molecules are chiral, and at least with naturally occuring sucrose there is only one enantiomer. This means that it has to be non-centrosymmetric. This statistic must be treated with caution because it can be misleading. Examples of such problems usually involve data from twinned crystals, pseudo-symmetric crystals, non-centrosymmetric crystals that happen to have centrosymmetric distributions of heavy atoms etc. None of those problems concern us here because the |E*E-1| value is reasonable.

The next block tells us about the presence of screw axes (written as -21- in the table column heading) and glide planes (written as -a-, -c-, -n-). The table is laid out in much the same way as the table used to decide on the presence of lattice centering. In this example there are no violations for the 21 screw axis, but plenty of violations for the a,c,n glide planes. Again, this makes complete sense. Since natural sucrose has just one enantiomer, there can't possibly be any glide planes in the crystal structure because glide planes include a mirror operation. Note: depending on how the data reduction was performed, there may have been non-zero numbers in the 21 column. This would happen if the systematically absent reflections were left in the dataset during merging and scaling. If you don't know what this means, you may want to find out. Either way, it should be apparent that there is a 21 screw axis in this structure.

XPREP uses all the information gathered so far to try to ascertain the space group. There are usually a few suggestions along with a bunch of information that enable you to make an informed choice. Probably the most useful statistic here is the CFOM (combined figure-of-merit), which is usually smallest for the correct space group. In this case, XPREP suggests option 'D', P2(1), which is just the SHELX way of writing P21. Accept this by hitting <return>.

We're back at the main menu, but with the 'D' option recommended as the next step. In this tutorial we will simply hit the <return> key through these screens, but a brief explanation of each will be given:

This is a separate sub-menu. The program wants to show you some intensity statistics, hit <return>.

Merging and intensity statistics depend on whether symmetry equivalents have been merged or not. Feel free to try the different options. If you try the suggested 'A' option you get the following:

As you can see, there's a lot of information. The table splits the data into resolution shells with roughly the same number of reflections in each. The Rint column gives a measure of whether symmetry equivalents have the same intensity. Notice that in the higher resolution shells (lower down the table), the Rint values get larger. The mean intensity and mean I/sigma tend to get get smaller at higher resolution, while the Rsigma get larger. These variations with resolution happen because diffraction intensities tend to be weaker, and therefore more susceptible to noise, at higher resolution. As a general rule, the data are usable so long as the mean I/σ for a shell is >2 and the Rint < 0.3, but there is some leeway. There is little point in including very high-resolution reflections if all they contain is noise, so you may occasionally have to truncate the high-resolution limit a bit (seek help if you are unsure). In any event, this sucrose data is strong throughout the whole resolution range.

On hitting <return> we get the following.

This sends us back to the main menu:

The 'C' menu item allows us to enter the molecular formula:

It is not absolutely necessary to know the formula, but it does help. If you don't know it, you can make up something plausible based on what the starting materials were, but remember to correct it later! When a formula is entered you get a chance to change some other stuff, as follows.

The Z value is the number of formula units in the unit cell. The program estimates Z based on the formula and the unit cell volume. If you have the formula correct, this is usually a good estimate. On returning to the main menu, you get this:

The next menu item selected is 'F', which sets up the files to be used by SHELXS, the structure determination program. On hitting <return>, you get to choose the filename for your instructions ('.ins') file:

For a routine structure it is ok to keep the same file name, but in some circumstances you may need to change it. This could happen if you had to re-index your data for any reason, e.g. if you found a higher symmetry cell. Here we choose the default 'sucrose', by hitting <return>. You get a choice of whether to use SHELXS (most common) or SHELXD (mainly for difficult cases). There is now a third option, SHELXT, which will work with files created for either SHELXS or SHELXD.

For this example, choose 'S' for SHELXS, to get:

XPREP gives you a listing of the instructions file ('sucrose.ins'), and the option of writing (or overwriting) the data file. You may recognize some of the numbers given here. The numbers on the CELL line are the wavelength of the x-rays in ångstrom units, followed by the cell dimensions. On the ZERR line below CELL, the numbers are Z (i.e., the number of formula units) followed by standard uncertainties (SU values) for each of the cell parameters. Unfortunately for kappaCCD data, the SUs written by XPREP are incorrect. The proper values are given in the nreport.html file, which means you will have to manually edit the '.ins' file to enter the proper values (you'll do this next). The rest of the commands in the '.ins' file will be explained in the next section of this tutorial. You are now ready to quit XPREP and solve the structure, so hit <return>.

This sends you back to the command line of the terminal window. If you want to see what files XPREP has created, type 'dir' and hit <return> (note: the 'dir' command is used in the UK x-ray lab. as a shorthand alias for the unix command 'ls -l', so it may not be present on unix systems outside this x-ray lab.).

There should be three new files, sucrose.ins, sucrose.pcf, sucrose.prp. As stated above, sucrose.ins contains instructions for the structure solving program, SHELXS. The file sucrose.prp contains a listing of all the stuff that was written to the screen by XPREP, which can be useful for diagnostic purposes. The file sucrose.pcf contains a bunch of information written in CIF format. Note: for data collected on the kappaCCD, not all of the information in the .pcf file is correct, but you do not need to worry about that.

Part 1: Setting up the instructions file - XPREP
Part 2: Structure solution - SHELXS
Part 3: Molecular editing - XP
Part 4: Structure refinement - SHELXL
Part 5: Thermal ellipsoid and packing plots - XP
Part 6: The Crystallographic Information File - CIF