6/6) The Crystallographic Information File - CIF
We pick up from the end of part 4
of this tutorial.
written by SHELXL
must be edited before it can accompany a
manuscript, or be deposited in a database. This is because some information is missing,
and some is incorrect. This final part of the sucrose tutorial shows how to edit the
made by SHELXL
so that it passes the IUCr structure checking program
The IUCr (International Union of Crystallography) has another excellent tool,
for editing and checking CIF
s, but in this tutorial we will not use it. If you
intend to publish in an IUCr journal, it really pays to use publCIF
as it is
used by the Acta Crystallographica journals to typeset structure-related papers.
You should have your CIF
from the last SHELXL
of this tutorial). If you don't
have it, you can download this copy: sucrose.cif
There is also a fully edited version of this CIF
, for you to
inspect. Without getting into the arcana of the CIF
format, the easiest way to
get started is to compare a newly generated CIF
with a fully edited version.
: The CIF
format is finicky. Misplaced semi-colons are a particular
source of grief, as is the legacy 80 character per line limit.
: This tutorial does not
show you how to construct a full report on
your crystal structure, it merely shows you how to correct the problems with a
Start by opening the CIF
in your text editor. As with editing SHELX
use a word processor. In this tutorial you'll see successive portions of
. If you roll over the image with your mouse it will highlight the lines
that need to be edited in red. Each of these images will be followed by a brief
description of the changes needed, followed by an image of the corresponding section of
after it has been edited.
Some crystal structures have multiple fragments in the asymmetric unit, such as ions,
solvent molecules, high Z' etc., and the _chemical_formula_moiety records
the formulae of the various pieces (moieties). For sucrose it is just the same
as the _chemical_formula_sum, so just duplicate the _chemical_formula_sum
line. If you don't know what the proper moiety formula is, don't enter it yet.
Later, the output from checkCIF will tell you what it is. If you happen
to know the answer to any of the other items here, you should replace the ? with the
Scrolling down the CIF we come to this:
_symmetry_cell_setting is where you enter the crystal system, for sucrose this
_symmetry_space_group_name_H-M is where you enter the Hermann-Mauguin
space group symbol, here it is 'P 21', which is CIF-ese for
There are other _symmetry_space_group_ CIF data names. For some strange
reason it seems to have become common practice to enter Hall's otherwise little-used
space-group notation, which for this space group is P2yb. Hall's nomenclature is
not so well known, so if you don't know the Hall symbol for your space group, don't
enter it yet. Later, the output from checkCIF will tell you what it should be.
For sucrose it is:
_symmetry_space_group_name_Hall 'P 2yb'
Further down in the CIF we have this:
The number of reflections
used for the
cell determination are given in the nreport file
just describes the crystal shape (e.g
block, plate, needle etc
.), and its colour should be obvious. If you have
forgotten these details, let this be a reminder to keep good notes
On to the next section ...
The most common type of absorption correction nowadays is the multi-scan method. This
is typically done with SADABS (by George Sheldrick), SORTAV (by Bob
Blessing) or Scalepack (by Zbyszek Otwinowski). Unfortunately, although
Scalepack is an excellent program, it does not give you the proper information
to edit this section of the CIF, so it is probably better to use SADABS
Notice that the T_min
values are rounded to three decimal
places, and that T_min
is a little different from the value in the unedited
file. The problem is that the unedited value is not derived from the absorption
correction, it comes from a simple calculation based on crystal size (SIZE
command in SHELXL
) and chemical composition. If Scalepack
was used for
merging, that is the best estimate you have. For SADABS
data though, the proper
value requires you to look in the
default name of this file is
Near the end of sad.abs
you will find a line like this:
Ratio of minimum to maximum apparent transmission: 0.960326
The value for _exptl_absorpt_correction_T_min
is found by multiplying this
0.960326 for this dataset) by the CIF
value of T_max
(0.9828 in this case), to give 0.944. If that seems a bit contrived, well that's because
it is a bit contrived. Nevertheless, it's the best estimate we have, so that's what we
use. Note that for analytical absorption corrections, the T_min
values should be output by the program used, and you'll need to edit both of these values in the CIF
Further down the CIF
we find this:
There's a bunch of stuff to change. The next image shows what the right entries should
be in this case, but other diffractometers require different information.
With area detector data, it is unusual for standard reflections to be collected, but
you should change these entries anyway. In CIF
-ese, the full stop, or period '.'
and the question mark '?' mean different things. The former means "not applicable
while the latter means "not known
". The difference is subtle, but you might as
well get it right.
, the values for _diffrn_reflns_number
are in the nreport
(see 'Total number of integrated reflections
' and 'Overall R-merge
' in the last table), while for SADABS
look in the log file.
On to the next block ...
The Greek letter sigma (σ) is given as '\s' in CIF-ese. The
_computing_ entries need to show the most current references for all the programs
used. Note that if you use publCIF, this should be added on the
_computing_publication_material line. For a CIF intended for an IUCr
journal, you will also need to add a bunch of extra stuff, including
_publ_section_references, but that is beyond the scope of this tutorial. If you
are wondering what is meant by 'local procedures', it simply refers to tasks
like manual editing of the CIF.
No matter what the intended purpose of the CIF, if there are hydrogen atoms in
the structure there should be a description of how the hydrogen atoms were treated.
This description goes in the _publ_ section of the CIF, but SHELXL
does not write any _publ_ lines at all. You can enter it by hand in the space
indicated by the red line in the roll-over image. While you're at it, you may as well
change the 'sigma' to '\s' in the _refine_special_details section.
This ought to give you something like the following:
The last thing that needs editing in this first pass through the CIF is here:
By default, SHELXL writes 'mixed' on the _refine_ls_hydrogen_treatment line. This would only be appropriate if you used a combination of constrained and refined H atoms. In this sucrose example we used a riding model for all of them, so this should be changed to 'constr', which is short for constrained.
Notice that the CIF
contains the lines _refine_ls_abs_structure_details
, even though you could not use Flack's
parameter to establish the absolute structure (see
of this tutorial). In a case
like this, you may be asked to remove these lines. The argument made for removing them
is that a Flack parameter with an SU
) so large
renders the parameter 'meaningless'. In point of fact, such a value is not 'meaningless'
at all, it tells you in very definite terms that the x-ray data alone cannot establish
the absolute configuration, particularly once Friedel pairs have been merged. As such,
it confirms what we know from the physics of anomalous dispersion. This is a subtle
point that is too often lost on journal editors and referees.
You should also add a line _chemical_absolute_configuration
', which stands for 'reference molecule
'. This states
that the handedness of your model is fixed by a reference molecule, in this case
obviously it is sucrose! If you prefer, you could put this extra line near the top of
along with the other _chemical_
entries, but it doesn't really
matter where it goes.
If you are lucky, your CIF
may now be sufficiently complete to survive
gives lots of diagnostic comments, it can be helpful for
correcting many CIF
problems. It is also useful for detecting more general
problems with your structure. So, open a browser and go to the
For this sucrose example, checkCIF returned the following report:
A few things here are worth mentioning. The checkCIF
report shows no syntax
errors, and the table shows that calculated
data are very
similar. The differences in Nref
show that our dataset is missing three
reflections out of about 1700. That's not so bad, and it's likely just because the
program calculated hmax
= 10 rather than the hmax
= 9 present in the dataset, and that was likely due to a rounding error. It also shows
that we merged the Friedel pairs (the number in square brackets for NRef
expected unmerged number of reflections). This is also shown on the 'Data
' line, and again there's nothing to worry about.
Next comes the 'Alert
' section of the report. There are four levels of alert:
A, B, C, and G. Further down, the report tells you how serious these alert levels tend
to be. Generally speaking, A-level
alerts are serious and need to either be
fixed or explained in a 'Validation Reply Form
potentially serious, and these too should either be fixed or you should have a
well-reasoned explanation. Be aware, however, that checking programs are not infallible,
so you may sometimes see A/B-level
alerts when there is really no problem at all.
That's ok so long as you have a cogent counter argument. C/G-level
just tell you things that you know already, but on occasion you may need to tweak the
model (and re-refine!) to eliminate the alerts.
For this sucrose tutorial, there are no A/B-level
alerts, but there are a bunch
. None of them are serious. You already know about the 'meaningless
Flack parameter' argument, and it's safe to dismiss it here. The data/parameter
ratio is low simply because the Friedel pairs were merged, and again it's not serious
for this structure. The link in the checkCIF
report, reproduced here
, shows the somewhat arbitrary
cut-off point for generating this alert, and it is insignificant for this structure.
A fairly close contact between H8a and H9a has been flagged, but a quick look at the
structure shows that these hydrogen atoms are fine. The uncertainty in the Flack
parameter is high, but that is to be expected; there was no discernible
anomalous signal, so the Friedel pairs were merged. Again, no big deal. Lastly, the
of the cell parameters all happen to have the same value. These things
happen sometimes, and it's not a problem. Further down, the G-level
no reason for alarm.
Conclusion: The structure determination is now complete.