What is clogp?
The partition coefficient (logP) of a material defines the ratio of its solubility in two immiscible solvents – although we normally use octanol : water, it could be any combination of immiscible fluids. This property is one of those chemical descriptors that pervades all aspects of ADMET and is used to filter out and define chemical space in which to work. Oddly, for such an important property, most projects and programs are built upon materials where the LogP has never been experimentally determined: relying on predicted values generated by software.
Recently, our DMPK scientist presented a series of predicted logP values vs some that he expertly determined in the lab. Whilst the correlation was good in many cases, there were some significant outliers, so he came to ask me, the computational chemist, to see if I might explain why the calculated logP was so different. There were some obvious structural features that can beguile certain methods of calculating logP – yes, there is more than one method of calculating logP – and other methods might closer predict the outlier values in our case.
Not All LogPs are Calculated Equal
When chemists talk about ClogP they are usually erroneously referring to “calculated” logP. To a CADD scientist, ClogP means something different – ClogP is a proprietary method (owned by BioByte Corp. / Pomona College) used to predict logP. Whilst there are a range of methods for prediction, there are three basic groups, and the vast majority of the current methods are flavours thereof:
Atomic (e.g. “AlogP”, ) & Enhanced Atomic / Hybrid (“XlogP”, “SlogP”)
Fragment / Compound (“ClogP”, KlogP, ACD/logP)
Property based methods (“MlogP”, “VlogP”, “MClogP”, “TlogP”)
Atomic logP considers that each atom has a contribution to the logP, and that the chemical entity’s final value is purely additive. Crippen et al. first proposed such a method in a series of papers in the late 80’s, with the refined version dubbed “AlogP”.1 The method is effectively a table look-up per atom, and there are plenty of free AlogP calculators available. It is suited to smaller molecules, particularly those with non-complex aromaticity or those which do not contain electronic systems that are known to have unexpected contributions to logP.
Enhanced Atomic or hybrid logP (XlogP, SlogP etc.) is a modification of the AlogP system – to try and address the shortcomings of atomistic approaches to larger systems, it takes the value of each atom type, as well as a contribution from its neighbours, as well as correction factors which help sidestep known deviances in purely atomistic methods. This is an attempt to allow for larger electronic effects. It is fast, being a table look-up technique, and many free software use this too. The smarter hybrid algorithms know the state of each atom and thus how much of a contribution its neighbours add.
Fragment / Compound logP is a method that uses a dataset from full compounds, or fragments, which are experimentally determined, and then modelled using QSPR or other regression techniques in small fragments rather than per atom. Fragment contributions are then added up, with correction factors. The rationale here is that sometimes atomistic approaches do not adequately model the nuances of electronic or intramolecular interactions, which may be better modelled by using whole fragments. This method tends to be better for systems with complex aromaticity, and larger molecules – on the condition that the molecule contains features that are similar to those from which the modelling was conducted. In the case of very obscure motifs in your molecules, then the model from which the prediction is made may not have a very good correlation.
Property based methods… There are a whole host of methods for determining logP using properties, empirical approaches, 3-D structures (e.g. continuum solvation models, MD models, Molecular Lipophilicity potential etc…), and topological approaches. Most of these methods are reasonably computationally intense, and are buried in the world of informatics and stats, but one is worthy or particular note: Moriguchi’s method (or MlogP), which used the sum of liphophilic atoms, and sum of hydrophilic atoms as the two basic descriptors in a regression model that was able to explain nearly 75% of variance in experimentally determined LogP values of a dataset of 1230 compounds.2 The group later added 11 correction factors, and the model explained 91 % of variance. It is very fast, and so historically it was employed for large datasets, and was included in several property prediction software, such as Dragon, and ADMET Predictor (Simulations Plus, Inc.). Nowadays as computational speed has increased, MlogP is used less, as more accurate methods become manageable, even at large library sizes.
So, which method do you use?
Biovia’s Pipeline Pilot, and Discovery Studio sport a version of AlogP, and Knime has multiple free X and A logP calculator plug-ins. CCG’s MOE uses both an unpublished atomic model (Labute) and a hybrid SlogP. DataWarrior uses ClogP, Dotmatics / Vortex natively use XlogP, but you can patch in others. Cresset BMD’s offerings use SlogP and Optibrium’s StarDrop uses a fragment method. ChemAxon uses multiple methods (including hybrid (VG) and fragment e.g. KlogP), and if you have their InfoCom nodes in Knime, then you can use multiple methods and weight them according to your understanding, or better yet, you can do a quick correlation check across the methods with known data in your series (if your group has the resource to experimentally determine a few of your own LogPs), and then weight your model accordingly.
As a rule (to which there are exceptions):
Simple small molecules (e.g. fragment sized) – AlogP will probably perform just fine, but a hybrid method would be better. Complex but standard small molecules (the normal development type med chemists love), then fragment / compound logP methods will often be the most accurate. Hybrid methods are your second best option (but still reasonably good).
Complex, non-standard molecules (with rare motifs), then a hybrid system or fragment-based logP may be equally good (or bad), it depends on the model on which the fragment logP is based. You could also get your team to determine some experimentally and see if you can’t build yourself a model…
For statistical insight into many state-of-the-art and classical methods, and how well they perform across large experimentally determined sets, see Mannhold et al.’s thorough review.3
So, to conclude, not all logP prediction models are built equal and there will be times when some models exceed others in accuracy, depending on your chemistry. Hopefully now you’ll at least be able to explain in your group meetings why your predicted logPs were way off…
References:
CLogP is seen in small molecule design puzzles. It's a measure of how hydrophilic or hydrophobic a ligand is. The ligand's cLogP can be displayed, and may figure in an objective that awards or subtracts points.
CLogP is a "partition coefficient", the ratio between the concentration of a substance in two different solvents. The "c" in the name represents "calculated", the "log" means "logarithm", and "P" refers to "partition". See partition coefficient on wikipedia for more detail.
The higher the cLogP, the more hydrophobic a substance is. Foldit puzzles typically include an objective that deducts points if cLogP is higher than 2.75.
CLogP is a critical measure for drug design. Successful drugs tend to have lower cLogP, meaning they are more hydrophilic.
CLogP is the ratio of two measurements using the same unit, so it's a unitless or dimensionless value, lacking its own unit of measure.
It's possible to measure an actual log(P) value using two solvents, n-octanol and water. In effect, oil and water. A substance is dissolved in a mixture of the two solvents, which then separate. ("Oil and water don't mix.") The concentration of the substance in each solvent is then measured. The concentration in n-octanol becomes the numerator of the ratio, and the concentration in water, the demoninator.