P212121 – The most frequently seen space group in protein crystals

Image Courtesy: PymolWiki

It is a fact that there is a non-uniformity with which different space groups occur in protein crystals. For example, the space group P212121 is the most frequent in protein crsytals and occurs almost one-third of the time!!!

Why is this so? This was the question asked by Wukovitz and Yeates in their paper titled “Why protein crystals favour some space-groups over others” [1]
Comparing the protein crystals with organic molecule crystals it seems there are marked differences. The rules for organic molecules’ molecular packing was proposed by Kitaigorodskii and it became widely accepted. [2]

However, If we look at the distribution of the space groups in organic molecules and proteins there are marked differences. Thus, the authors argue that same criteria cannot be applied to proteins. One major difference between the crystals is that protein crystals contain 50% solvent by volume, while organic crystals are jam-packed with less space. This results in a higher “coordination number” (10-14) for organic crystals than for proteins, where the number is average 7.5

Based on all these, the authors tried to devise a simple statistical measurement that can answer as to why certain space groups are preferred among the 65 biological space groups.

And the formula is:

D=S+L-C, where

D = Total number of rigid-body freedom
S = number of meaningful degrees of freedom
L = number of independent parameters for describing the unit cell, and
C = minimum number of unique contacts required to make the set of symmetry related molecules

All three are positive integers and are not adjustable parameters. The explanation given by a simple statistical analysis for protein crystals is “For a particular space group only a certain number of rigid-body degrees of freedom are available for assembling the first few molecules before the internal structure of the crystal is completely defined. This number depends on the space group symmetry.”

Three things limit the rigid-body degrees of freedom

  1. number of meaningful Rigid-body DOF for the first molecule in space
  2. the number of independent unit-cell parameters
  3. the number of intermolecular contacts to make a network

How to find C?
The problem of finding C is equivalent to the problem of identifying the minimal set of symmetry elements. For each space group, C can be determined by finding the minimal set of generators for each space group. The numbers range from 5 to 2.

The authors observed that the calculated value of D correlated with the observed frequency of the space group!That is, higher the value of D the most frequent space group.  Guess which space group had a higher D value?

Now the question comes back to “Why P212121 is more frequent?” The reason is that this space group is the least restrictive for the possible orientations and positions of the molecules in the crystal.

The authors do note that their analysis does not take into consideration of the shape of the molecule, energetics, and packing efficiency, which can lead to answers for non-monomeric proteins in the asymmetric unit.  According to the authors, P1 has a D value of 8, and is predicted to be the most used space group for racemic protein mixtures.


  1. Wukovitz SW, & Yeates TO (1995). Why protein crystals favour some space-groups over others. Nature structural biology, 2 (12), 1062-7 PMID: 8846217
  2. Kitaigorodskii AI. Organic Chemical Crystallogrphy (1955) Consultants Bureau, New York (Originally published in Russian by Press of the Academy of Sciences of the USSR, Moscow)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: