• Research article
  • Open Access
  • Published: 06 March 2019

Automated simultaneous assignment of bond orders and formal charges

Journal of Cheminformatics volume  11 , Article number:  18 ( 2019 ) Cite this article

3173 Accesses

2 Citations

4 Altmetric

Metrics details

Bond orders and formal charges are fundamental chemical descriptors. In cheminformatic applications it is necessary to be able to assign these properties to a given molecular structure automatically, given minimal input information. Here we describe a method for determining the bond order and formal charge assignments from only the atom types and connectivity. Our method utilises a graph theoretical description of electron positions. Each electron position assignment is scored according to lookup tables of atomic and bond dissociation energies derived from quantum chemical calculations. We tested three different optimisation methods—local optimisation, an A* pathfinding method, and an FPT optimisation method utilising tree decompositions—for finding the best electron position assignment, from which the bond orders and formal charges are extracted. We show that our method can assign bond orders and formal charges at a high degree of accuracy across a wide range of molecules from two different databases, and that the FPT algorithm provides the best combination of speed and accuracy.

Introduction

The ability to assign chemical characteristics such as bond orders and formal charges is crucial to many higher-order algorithms in computational chemistry and cheminformatics. The formal charge is the charge assigned to an atom in a molecule assuming that electrons in all chemical bonds are shared equally between atoms, and the bond order of a bond is the number of chemical bonds between a pair of atoms. Both properties can be easily deduced from the Lewis structure of a molecule, which shows how the valence electrons are arranged amongst the atoms and bonds of the molecule.

The best way to determine the Lewis structure of a molecule is to calculate the actual electronic density distribution and then use the Natural Bond Orbital method [ 1 ] to obtain bond orders and formal charges. However, this approach can be computationally expensive, and with the advent of large databases of organic molecules, such as the Protein Data Bank (PDB) and Cambridge Structural Database (CSD), the need for fast automated schemes became pertinent. As such, over the last few decades, a number of such schemes have been developed. The COBRA program of Leach et al. [ 2 ] uses a backtracking search algorithm to automatically assign bond orders. IDATM from Meng and Lewis [ 3 ] can be used to determine the connectivity and hybridisation state of atoms based on input three-dimensional (3D) coordinates. Baber and Hodgkin follow a similar scheme, but can also assign bond orders [ 4 ]. Lang et al. assign bond orders based on characteristic bond lengths, bond angles and torsion angles, [ 5 ] as do Hendlich et al., who also include small functional group identification to help avoid incorrect assignments due to erroneous or ambiguous geometrical data [ 6 ]. All of these methods, however, require accurate 3D coordinate information. The methods of Froeyen and Herdewijn [ 7 ] and Labute [ 8 ] could theoretically be used on structures with only atom type and connectivity information, but they were developed primarily for use when 3D coordinate information, albeit for only the heavy atoms, is provided.

Wang et al. developed a heuristic method to determine bond orders based on arbitrary penalty scores [ 9 ]. The biochemical algorithms library (BALL) software from Dehof et al. is an extension of this work. Dehof et al. used the same penalty scores as Wang et al., and developed three exact solvers guaranteed to find a bond order assignment with minimum total penalty score, as well as allowing enumeration of all possible bond order assignments with minimum total penalty score [ 10 ]. Both of these methods only require element type and atom connectivity information when all hydrogen atoms are included. As formal charges and bond orders are somewhat co-dependent, the absence of formal charges in the input molecule could result in incorrect atom types being perceived, leading to incorrect bond order assignments. Theoretically, formal charges can be back calculated from bond order assignments (if they are correct), however there can be situations where ambiguous formal charge assignments are possible, and there is no guarantee that the calculated formal charges will match the required total molecular charge.

As part of a broader molecular dynamics automated parameterisation scheme, we have developed a new method for the simultaneous assignment of formal charges and bond orders. In order to allow both of these properties to be assigned in situations where the available coordinates are not energetically favourable, our method requires only chemically plausible atomic coordinates, along with element type, atom connectivity, and the total charge of the molecule. The combination of atom connectivity and total molecular charge fixes the protonation state of the molecule. We take advantage of the fact that in essence, formal charges and bond orders are descriptions of the positions of valence electrons within a molecule. Minimising some function of the electron positions thus results in an optimal formal charge and bond order assignment. Given that electron positions are involved, the obvious choice of function is one derived from high-level quantum chemical calculations.

We describe here the function that is minimised and three different optimisation methods: local optimisation, the A* pathfinding method, and an FPT optimisation method utilising tree decompositions. We first check the self-consistency of the algorithms, then compare the performance of the A* method in our software and in the BALL software of Dehof et al., and lastly compare to the bond order and formal charge assignments from two molecular databases in terms of both the speed and the accuracy with which these reference data are reproduced, as well as the level of theory required for the quantum chemical calculations such that the scoring function identifies the correct assignments. We find that the FPT algorithm provides the best balance between efficiency and accuracy. Using FPT, our method attains similar accuracy to that of Dehof et al., but without the need to provide the formal charge.

The code described here is available on Github at https://github.com/allison-group/indigo-bondorder . It is written in C++, utilising the Boost graph library, [ 11 ] and requires a C++-14 compliant compiler. For ease of use, Python bindings are provided for using version 2.2.2 of the pybind11 library [ 12 ].

Our bond order and formal charge assignment scheme determines an optimal assignment of electron positions for a molecule by minimising a score that is a function of the electron assignment. We first describe how the chemical features of the molecule are represented and the initialisation of the electron position assignments. We then outline the electron assignment scoring function, which depends upon scores associated with the formal charge state of each atom and the order of each bond. Lastly, we describe the three different methods used to optimise the electron position assignments, giving rise to the formal charge and bond order assignments for the molecule.

Initialisation

A query molecule can be submitted in any of the standard chemical formats able to be parsed by Open Babel, so long as the number and type of each atom, including hydrogen atoms, their connectivity, and the total molecular charge are provided explicitly. Some file formats allow for implicit hydrogen atoms through the use of atom typing. This information is encoded internally as a molecular graph, so that a graph theoretic approach can be used to optimise the electron position assignment.

The total number of electrons whose position must be optimised is calculated from the number of valence electrons according to

where \(e_T\) is the total number of electrons to place, \(q_T\) is the total molecular charge, \(N_A\) and \(N_B\) are the numbers of atoms and bonds in the molecule, respectively, and \(\nu _i\) is number of valence electrons of atom i , which is known from its elemental type. The \(-2N_B\) component accounts for each bond in the molecule requiring two electrons in order to have a bond order of at least one.

The positions that electrons can occupy in a molecule, encoded as a multiset of graph vertices and edges, \({\mathcal{P}}\) , are determined as follows. Each element has a target valency, set to eight for all elements other than hydrogen, for which it is two, and phosphorous and sulfur atoms involved in at least three bonds, for which it is set to ten and twelve respectively. In this way, we allow for hypervalent representations of functional groups such as phosphates and sulfates, while representing all other groups, such as nitros, in charge separated form. The multiplicity of a given vertex v in \({\mathcal{P}}\) is given by \(v \in ^{m_v} {\mathcal{P}}\) where

with \(\tau _v\) being the target valency of the element associated with v and \(\delta (v)\) the degree of v . The multiplicity of a given edge \(e = (u,v)\) in \({\mathcal{P}}\) is given by \(e \in ^{m_e} {\mathcal{P}}\) where

Electron assignment scoring

Let \(G=(V,E)\) be a molecular graph. An electron assignment is a map \(c_p:V\cup E \rightarrow {\mathbb Z}_{\ge 0}\) where \(c_p[x]\) is the total number of electrons placed on member \(x \in V \cup E\) . The score of the electron assignment is then given as the sum of scores for each vertex and edge.

These scores are stored in a lookup table, \(\Gamma\) , using bit shifting to generate a unique 32-bit unsigned integer key, \(k_x\) . If a given key is not found in the lookup table, a default score of \(\infty\) is given.

For a vertex v , the lookup table key depends on the element of the corresponding atom and its formal charge, calculated as

where N ( v ) is the set of neighbouring vertices of v . The first seven bits of the key are set as per the binary value of the atomic number. The next four bits are set to the binary value of the magnitude of the calculated formal charge. Finally, the twelfth bit is set if the formal charge is negative, and left unset if it is positive. If the valence state of an atom exceeds its target valence, the score of that atom is set to \(\infty\) , and no key is calculated.

For an edge e , the lookup table key depends on the elements of the two vertices the edge is between, and the number of electrons assigned to the edge, \(c_p[e]\) . In the same manner as the vertex key, the first seven bits of the key are set to the binary value of the atomic number of one of the vertices. The next seven are set to the binary value of the other vertex. Finally, bits fifteen through eighteen inclusive are set as per the binary value of the number of electrons assigned to the edge. As each edge will always have at least two electrons assigned, there is no overlap between the set of possible keys for the vertices and edges.

These simple methods for key determination were chosen as the corresponding scores can be easily determined from quantum chemical calculations. We note that there is sufficient flexibility in using 32-bit unsigned integer keys for more complex key generation methods to be used, incorporating optimised key–score pairs.

At present, scores have only been calculated for the elements—hydrogen, carbon, nitrogen, oxygen, fluorine, phosphorus, sulfur, chlorine, and bromine—and bonds—single, double, and triple—that most commonly occur in bio-organic molecules, but as these scores are derived from quantum chemical calculations, they can easily be supplemented as required.

Formal charge score

In a crude sense, atoms with formal charges can be described as ions with a charge equal to the formal charge. Therefore, formal charge scores were determined from quantum chemical calculations of atomic/ionic energies. For each element, scores for all possible formal charge states were calculated. For example, carbon can have formal charge states ranging from \(+\,4\) (all valence electrons removed) to \(-\,4\) (electrons added until valence shell is an octet), thus we consider \(\hbox {C}^{4+}\) , \(\hbox {C}^{3+}\) , \(\hbox {C}^{2+}\) , \(\hbox {C}^+\) , \(\hbox {C}^0\) , \(\hbox {C}^-\) , \(\hbox {C}^{2-}\) , \(\hbox {C}^{3-}\) and \(\hbox {C}^{4-}\) . While in normal molecules, it is highly unlikely that the majority of these formal charges are viable, they are included for completeness and to help guide the optimisation methods away from unrealistic electron assignments.

The atomic energy depends on the spin state of the atom or ion. Looking at carbon again, there are four valence electrons in an electron configuration of \(1\hbox {s}^{2}2\hbox {s}^{2}2\hbox {p}^{2}\) . The lowest energy spin state is the triplet state, with the two electrons in the 2p shell unpaired in degenerate orbitals. Both singlet and quintet states are conceivable, but they are higher energy states. We therefore consider only the lowest energy spin state for a given formal charge of each element. The scores were then determined as being the difference between the lowest energy spin state, and a reference energy of the neutral atom in either the singlet or doublet state, depending on its number of electrons. All calculations of atomic/ionic energies were calculated with the CR-CCL method, [ 13 , 14 ] utilising the def2-SVPD and def2-TZVPPD basis sets [ 15 ]. Calculations were performed using the GAMESS-US version 18AUG 2016(R1) software [ 16 , 17 ]. The scores are given in Additional file 1 : Tables S1 and S2.

Bond order score

Bond dissociation energies are a natural basis for the bond order scores. The bond dissociation energy is defined as the change in enthalpy when a bond is homolytically cleaved. For example, the bond dissociation energy of the \(\hbox {C}{-}\hbox {O}\) bond in methanol is given by the enthalpy change associated with the reaction

The bond dissociation energy is computed by calculating the energy difference between a molecule containing the bond of interest and the two fragments produced by homolytic cleavage of the bond. We create the simplest possible molecule containing each bond type as determined by a stick drawing. This is just the two atoms involved, with hydrogen atoms added to fill the valence positions of non-hydrogen atoms. For example, to determine the score to use for a \(\hbox {C}{\equiv }\hbox {C}\) bond, the bond dissociation energy of ethyne is calculated. The score for each bond is determined as being the difference between the bond dissociation energy and the bond dissociation energy of the highest order bond considered between the two elements involved.

The structure of each molecule and the fragments produced by homolytic cleavage are geometry-optimised at the MP2 [ 18 ] level of theory, followed by a single point energy calculation with the CR-CCL method, [ 13 , 14 ] utilising the def2-SVPD and def2-TZVPPD basis sets [ 15 ]. Where appropriate, calculations were performed both with and without accounting for the basis set superposition error [ 19 ]. Calculations were performed using the GAMESS-US version 18AUG 2016(R1) software [ 16 , 17 ]. The scores are given in Additional file 1 : Tables S3–S6.

Formal charge and bond order determination

Once an optimised electron assignment has been calculated, determining the formal charge on each of the atoms and the bond order of all of the bonds is simple. For each atom, its formal charge is calculated as shown in Eq. ( 5 ). For each edge, e , the bond order is calculated as

Optimisation methods

Finding the formal charge and bond order of a molecule requires minimising the value of S given in Eq. ( 4 ). Three different optimisation techniques for finding the lowest scoring electron assignment were tested: a steepest descent local optimisation method (“ Local optimisation ” section), an A* pathfinding based method (“ A* ” section) and an FPT optimisation method utilising tree decompositions (“ Fixed parameter tractable (FPT) ” section).

Local optimisation

Local optimisation acts similarly to a steepest decent gradient optimisation method. It is a greedy method that searches for an optimal electron assignment by finding the lowest scoring neighbour of a given electron assignment and iteratively applying this neighbour search until there are no lower scoring neighbours. Computationally, it is a relatively cheap optimisation method, and will always converge, but not necessarily to the global minimum.

Setup An initial electron assignment is generated as follows. For each possible electron assignment position \(p \in {\mathcal{P}}\) , a score is calculated and then an electron is assigned to the position which has the lowest score. This is iteratively applied, assigning a single electron at a time, until \(e_T\) electrons have been assigned, giving the initial electron assignment.

Neighbour searching Local optimisation determines the score change that would result when going from one electron assignment to each neighbouring electron assignment, which are determined as follows. The multiset of possible positions for electrons to be assigned, \({\mathcal{P}}\) , is converted to a set P , i.e. duplicate members are removed. Every member \(p \in P\) is checked to determine if it contains electrons in the current electron assignment, i.e. \(c_p[p] \ne 0\) , meaning that it can be used as an electron source. If it can, all other members \(q \in P \setminus p\) are checked to determine if they can hold another electron, i.e. \(\text {mult}({\mathcal{P}},q) - c_p[q] > 0\) , meaning that q can act as a target electron position. A neighbour of the current electron assignment is produced by moving an electron from a source position to a target position. Thus, all the neighbours of an electron assignment are given by the set of electron assignments produced from all possible source–target pairs. If multiple electron assignments with the same score exist, the neighbour searching can search the neighbours of all of the assignments, instead of just one of the assignments.

Score minimisation Determining an optimal electron assignment using the local optimisation method is straightforward. The score of each neighbour assignment is determined. If at least one of the neighbour(s) has a lower score, the neighbour search is repeated using the lowest scoring neighbour(s) as the initial electron assignments. This iterative update of the electron assignment proceeds until there are no neighbours with a lower score, in which case the optimisation process has converged to a local minimum.

An A* approach was one of the three optimisation methods utilised by Dehof et al. for bond order assignment [ 10 ]. Such an approach is taken here for comparison purposes.

A* is a path-finding algorithm for determining a minimum cost path between a start, s , and end, t , location [ 20 ]. It employs a search heuristic as a means to guide the path finding process towards more promising paths. The list of vertices to search from is stored in a priority queue, meaning the most promising vertices are searched first. The priority is determined by assigning a cost, \(f(r) = g(r) + h(r)\) , where g ( r ) is the real cost of the path \(s\dots r\) and h ( r ) is an heuristic estimate of the cost of the path \(r\dots t\) , to each visited vertex r . If the cost of a vertex r exceeds some upper limit value, that vertex will not be added to the priority queue. Here, this upper limit is calculated as the score of the initial local optimisation electron assignment (see “ Local optimisation ” section) plus one. Obviously, the nature of the heuristic function will influence the efficiency of the search algorithm. To be guaranteed to obtain a minimum cost path, the heuristic must be admissible , meaning ‘optimistic’. That is, the true cost of the path \(r\dots t\) cannot be lower than h ( r ).

Let \(P \subset {\mathcal{P}}\) be the set of unique possible positions to assign electrons. The score minimisation problem given the molecular graph \(G = (V,E)\) can be formulated into a | P |-level tree T , i.e. the path from the root vertex to a leaf will be of length | P |. Each level of the tree represents a possible position for electrons to be assigned. A vertex at level k has \(m + 1\) neighbours, where \(m = \text {mult}({\mathcal{P}},w)\) and \(w \in P\) is the position associated with level \(k+1\) , to allow for all possible electron counts placed in w , from 0 to m .

To formulate the scoring functions g ( r ) and h ( r ), some additional definitions must be made. A partial electron assignment, R ( r ), is denoted as the set of pairs ( j ,  n ) where j is a member of the path \(s\dots r\) and \(n \in \{0,\dots ,\text {mult}({\mathcal{P}},j)\}\) is the number of electrons assigned there. R ( r ) also contains pairs ( x , 0) for all elements \(x \in V \cup E \setminus P\) .

Q ( r ) is the set of calculable members \(x \in V \cup E\) at vertex r . x is deemed calculable at vertex \(r \in T\) if the following conditions are met:

\(x \notin P \setminus R_i\) , where \(R_i\) is the set of first members of R ( r );

If \(x \in V\) , condition 1 holds for all neighbours of x ;

If \(x \in E\) , the pair \(x = {y,z} \subseteq Q(r)\) .

As condition 3 is a requirement for determining the calculability of bonds, the calculability of all atoms is determined first.

Cost Functions Each vertex that is visited through the A* search is assigned a cost, \(f(r) = g(r) + h(r)\) . The exact cost of the path \(s\dots r\) , g ( r ), is defined as

where \(\Gamma [k_x]\) is the score of member \(x \in V \cup E\) with partial electron assignment R ( r ), as defined in “ Electron assignment scoring ” section. If the number of electrons assigned in R ( r ) is greater than \(e_T\) , \(g(r) = \infty\) .

The heuristic cost of the path \(r\dots t\) , h ( r ) is defined as

where \(Q(r)^\complement\) is the complement of Q ( r ).

If \(x \in E\) , such that \(x = \{y,z\}\) then B ( x ) is the set of possible numbers of electrons to assign to x and \(k_{x,a}\) determines the key for x given R ( r ) with an additional a electrons assigned to x . B ( x ) is given by

where V ( y ) is the valence of y in the partial electron assignment R ( r ).

If \(x \in V\) , then B ( x ) is the set of formal charge values x can attain given R ( r ), and \(k_{x,a}\) determines the key for x assuming it has a formal charge of a . In this case, B ( x ) is given by

where F ( x ) is the formal charge of x given R ( r ) calculated as per Eq. ( 5 ).

Fixed parameter tractable (FPT)

In a similar vein to Dehof et al. [ 10 ], we also implement an FPT based approach. Given a molecular graph \(G = (V,E)\) , which is a tree, the electron assignment problem can be easily solved using dynamic programming, i.e. recursively splitting the problem into smaller sub problems and solving the sub problems. Not all molecular graphs are trees, but their generally sparse nature means that they are ‘tree-like’.

Given a graph \(G = (V, E)\) , the tree-decomposition of G , \((T, {\mathcal{V}})\) , where T is a tree and \({\mathcal{V}} = (V_t)_{t \in T}\) is a family of vertex bags \(V_t \subseteq V\) indexed by the nodes t of T . \((T, {\mathcal{V}})\) satisfies the following three conditions:

\(V = \bigcup _{t\in T}V_t\) ;

for every edge \(e \in E\) where \(e = \{u,v\}\) there exists a \(t \in T\) such that \(e \subseteq V_t\) ;

\(V_{t_1} \cap V_{t_3} \subseteq V_{t_2}\) whenever \(t_1,t_2,t_3 \in T\) satisfy \(t_2 \in t_1Tt_3\) .

The width of \((T,{\mathcal{V}})\) is the number \(\max \big \{|V_t| - 1 : t \in T\big \}\) . This width gives the fixed parameter. Figure  1 b shows a tree-decomposition of the graph in Fig.  1 a. It has a width of two.

figure 1

A graph a \(G = (V,E)\) with \(V = \{\text {A,B,C,D,E,F,G,H}\}\) and \(E = \{\text {(A,B),(A,C),(A,E),(B,D),(B,H),(C,D),(C,F),(D,G)}\}\) , a tree-decomposition b \((T,{\mathcal{V}})\) of G and c a nice tree-decomposition. The nodes of the nice tree-decomposition are coloured red for the root node, green for leaf nodes, white for introduce nodes, blue for forget nodes and orange for join nodes

In order to simplify the algorithm, the concept of a nice tree-decomposition is used. A tree-decomposition \((T, {\mathcal{V}})\) is called nice if it satisfies the following conditions:

T is rooted at a leaf node r and \(V_r = \emptyset\) ;

For every leaf \(l \in T\) , \(V_l = \emptyset\) ;

Every node \(t \in T\) has at most two children;

If \(t \in T\) has two children, c and d , then \(V_t = V_c = V_d\) and t is known as a join node;

If \(t \in T\) has one child, c , then one of the following conditions is true:

\(V_t \subset V_c\) and \(|V_t| = |V_c| - 1\) and t is known as a forget node with forgotten vertex \(v_{t_f} {:}{=} V_c \setminus V_t\) .

\(V_t \supset V_c\) and \(|V_t| = |V_c| + 1\) and t is known as an introduce node with introduced vertex \(v_{t_i} {:}{=} V_t \setminus V_c\) .

Forget and introduce nodes are defined in relation to the path from a leaf node to the root node. Figure  1 c shows a nice tree decomposition produced from the tree decomposition in Fig.  1 b.

The electron assignment optimisation process requires scores to be determined for both atoms and bonds. As a nice tree-decomposition introduces and forgets vertices, the edges of G must be treated as vertices. To do this, the edges are explicitly added into the bags of the tree-decomposition of G , at the expense of a larger width. An edge is introduced at the same time the first of its vertices is introduced, and forgotten once the second vertex is forgotten.

Algorithm A tree-decomposition of a molecular graph is obtained using the GreedyFillIn upper-bound heuristic described by Bodlaender and Koster, [ 21 ] and converted to a nice tree decomposition. Optimisation of electron assignment then proceeds as follows. Let \(t \in T\) be a node of the nice tree-decomposition of a graph. Then \({\mathcal{X}}_t\) is the set of forgotten vertices \(v_{s_f} \in V_s\) associated with the forget nodes of the subtree \(T_s\) induced on T below (and including) the node t . The total number of electrons to assign \(e_T\) and the multiset \({\mathcal{P}}\) of positions at which the electrons can be assigned are determined as per “ Initialisation ” section. Each node t is given a score table \(S_t\) indexed by the ordered pair \((l,k) \in L_t \times K_t\) where

with \(S_t[l,k]\) being the minimum score of forgotten vertices \({\mathcal{X}}_t\) with \(l \in L_t\) forgotten electrons and the additional constraint of further partial electron distribution \(k \in K_t\) . Beginning from the leaves of the nice tree-decomposition, and scoring only when all children of a node have been scored, the algorithm distinguishes the kind of each node and determines the score matrix as follows:

Leaf node Leaf nodes are empty sets, so the score table of a leaf node is also empty.

Introduce nodes  Let \(t \in T\) be the introduce node with child c , and \(x_t = {\mathcal{X}}_t \setminus {\mathcal{X}}_c\) . Then

Forget nodes Let \(t \in T\) be the forget node with child c , and \(x_t = {\mathcal{X}}_c \setminus {\mathcal{X}}_t\) . Then

where \(E(x_t,k \cup {\mathcal{X}}_t,n)\) is the score of \(x_t\) with n electrons positioned and the partial electron distribution \(k \cup {\mathcal{X}}_t\) .

Join nodes Let \(t \in T\) be the parent of c and d with \(V_t = V_i\) for \(i \in {c,d}\) . Then

Root node Each nice tree decomposition has only one root node \(r \in T\) which is formally a forget node. However the score table of the root node is unpopulated as \(K_r = \emptyset\) . Rather than fill a score table, the minimised electron assignment score is determined. Let c be the child of r with \(x_r = {\mathcal{X}}_c\) . Then the minimum score is given by

Practical optimisation

There are a number of techniques which can be used to optimise the electron assignment algorithms described above. As opposed to implementation optimisation techniques which do not affect the outcome of the algorithms, only the computational cost of performing them, these are considered practical optimisations as they could influence the results obtained, but generally would not be expected to do so.

Electron pairs A simple means to optimise the algorithms is to utilise electron pairs instead of single electrons. Generally, one would expect electrons to be found in pairs anyway: two electrons per bond order and lone pairs of electrons on atoms. By explicitly assigning pairs, the search space can be massively reduced, leading to an increase in performance. This optimisation is recommended, is the default setting, and is used for all results presented here.

Pre-placing electrons In the majority of molecules, there are a number of elements to which a minimum number, larger than zero, of electrons are expected to be assigned. For example, the halogens would be expected to have three lone pairs of electrons assigned when they are bonded to only one other atom. By placing an expected minimum number of electrons on various atoms of the molecule before undertaking the optimisation, the search space of the algorithms is reduced, and so the optimisation cost is reduced.

Reference formal charge and bond order assignments

Validation of the accuracy of our method requires reference data, i.e. a set of molecules for which both formal charge and bond order properties are already known. The reference data must also represent aromatic bonds in kekulised form, i.e. alternating single and double bonds, and hydrogen atoms must be explicitly present or able to be added automatically. Two molecular databases that fit these requirements were chosen as reference data sets: the MMFF94 validation suite [ 22 ] and the KEGG Drug Database [ 23 ]. These databases were previously used to validate the bond order assignment methods developed by Dehof et al. [ 10 ].

All molecules in each reference data set were parsed to extract the element and connectivity information required for our internal graph theoretic representation (see “ Initailisation ” section). In cases where one structure file contained multiple molecules, the molecules were treated separately. Molecules were discarded if they contained dangling bonds due to being monomer units, if they contained three or fewer non hydrogen atoms, if they contained elements not included in the score tables, if they contained an odd number of valence electrons, or if they were identical to a previously parsed molecule.

The MMFF94 validation suite contains 761 structures for small molecules and ions, 698 of which are derived from the CSD. The native CSD structures were manually modified by the authors, by assigning bond orders and formal charges and, where appropriate, adding missing hydrogen atoms to complete the valence [ 22 ]. Formal charges and bond orders are available in either hypervalent or dative representation, with the hypervalent representation used here. After filtering using the rules described above, 691 unique molecules were identified. Canonical SMILES strings for these structures are given in Additional file 1 : Table S7.

The KEGG Drug Database contains a large number of drug like molecules [ 23 ]. Its structure files contain only 2D coordinate information, meaning that they are a perfect test set for connectivity-only bond order and formal charge assignment. Hydrogen atoms are not explicitly present in the structure files. Rather, they are implicitly given through providing types to the heavy atoms. Accordingly, explicit hydrogen atoms were added to the molecules using this atom type information [ 24 ]. After filtering using the rules described above, 5676 unique molecules were identified. Canonical SMILES strings for these molecules are given in Additional file 1 : Table S8.

Results and discussion

We first discuss the consistency of the three optimisation algorithms described above by comparing the optimised electron assignment scores that they provide. Then we discuss the accuracy of the FPT algorithm in regards to its ability to reproduce the formal charge and bond order assignments provided by the two reference databases.

Algorithm consistency

For the consistency tests described here, the scores used were derived from calculations performed using the def2-SVPD basis set, without accounting for the basis set superposition error. This set of scores was chosen as for self-consistency tests it does not matter whether the optimised score corresponds to the true formal charge and bond order state, rather only that the algorithms are correctly determining the lowest-scoring state. Electron pairs were utilised for maximum performance. Electrons were not pre-placed.

As expected, we find that the A* and FPT algorithms have 100% agreement in regards to the optimised score, with both molecular databases. This is reassuring as it indicates correct implementation of the two algorithms. In relation to these, the local optimisation algorithm performs remarkably well. For the MMFF94 database, there is 81% agreement with the A* and FPT algorithms, and for the KEGG drug database, there is 91% agreement with the A* and FPT algorithms. Such high percentage accuracies indicate that using the highly efficient local optimisation algorithm could potentially be acceptable in extremely high throughput applications where overall speed is more important than accuracy.

Algorithm efficiency

We next consider the efficiency of each algorithm, defined as the average time required to find the lowest score for a single molecule. The distributions of calculation times for each algorithm and each dataset are shown in Fig.  2 .

figure 2

Histograms of algorithm execution time for the three optimisation algorithms and two data sets. For all plots, the horizontal axis shows the execution time of the algorithm in seconds and the vertical axis shows the base-ten logarithm of the count for each bin. Fifty bins of even width were used. Mean ( \(\mu\) ) and median ( \(\eta\) ) values for the distributions in blue are provided. a , b Show the time distributions for the MMFF94 and KEGG databases when optimised with the Local Optimisation algorithm. In both cases, a single outlier with an execution time in excess of 200 s is excluded from the plots. c , d Show the time distributions for the MMFF94 and KEGG databases when optimised with the A* algorithm. The orange bins indicate molecules that failed to complete optimisation due to a 1024 MB memory limit imposed on the A* search queue. e , f Show the time distributions for the MMFF94 and KEGG databases when optimised with the FPT algorithm

From the distributions, we can tell that the local optimisation algorithm is the most efficient with a maximum 90-th percentile execution time of only 0.015 s, followed by the FPT algorithm with a maximum 90-th percentile execution time of 0.656 s and finally the A* algorithm with a maximum 90-th percentile execution time of 212 s when molecules that failed to complete optimisation due to a 1024 MB memory limit imposed on the A* search queue are excluded. All algorithms show exponential decay in the execution time, showing that in the majority of cases, one would expect any of the algorithms to give a result in a reasonable time period. This is further reinforced by the low median execution times.

Local optimisation, Fig.  2 a and b, was the stand-out efficiency algorithm, though both databases had a single outlier molecule which took longer than 200 seconds to optimise. In these two cases, the initial electron assignment required a large number of neighbour search steps before a distribution with non-infinite score was discovered, which could then be minimised. Due to the extremely long computational time required for these two molecules, an upper limit on the execution time of 5 s was implemented for the local optimisation algorithm.

The A* algorithm, Fig.  2 c and d, is the least efficient of the three algorithms. A desirable A* search would be narrow throughout the search tree. The breadth of the search is primarily controlled by the heuristic function, where a better heuristic would result in a narrower search. However, even with a near-optimal heuristic, there can be cases where the leaves of the search tree have final scores with fractions of a percent difference between them. In these cases, the path through the search tree becomes broad, and so the overall algorithm efficiency decreases. Due to this, a memory limit of 1024 MB was imposed on the A* algorithm. This limit means that when the memory allocated for the queue exceeds the given amount, the algorithm halts without returning a solution. Molecules which triggered this limit are shown in the orange distributions in Fig.  2 c and d, where the execution time is the time taken before the limit was triggered. We note that there are molecules which did not trigger the memory limit but took longer to complete than the triggering time of any triggered molecules. This can be attributed to larger molecules requiring deep searches that are not necessarily as broad as other smaller molecules, so they can complete without triggering the memory limit.

Finally, the FPT algorithm, Fig.  2 e and f is a highly efficient algorithm that is also guaranteed to locate the global minimum score value. Though the vast majority of molecules take less than a second to determine an optimal score using the FPT algorithm, there are a small number of molecules which take in excess of ten seconds to complete. The execution times of these molecules are so long as their tree-decomposition contains wide join nodes that have a large number of potentially forgotton electrons. These large join nodes come about primarily due to areas within the molecule which can contain a broad range of electron counts, especially when the width of the tree-decomposition is high, such as highly conjugated aromatic systems or high electron density areas like sulfate or phosphate groups.

Comparison to reference assignments

The true measure of the accuracy of each algorithm is its ability to reproduce the formal charges and bond orders of the molecules in the reference data sets. Here we only consider the FPT algorithm as “ Algorithm consistency ” section showed that it is far more efficient than the A* algorithm whilst still providing a global minimum score.

Additionally, we compare our results to the A* method described by Dehof et al. [ 10 ] as implemented in BALL [ 25 ]. The A* method is utilised as the FPT method is not accessible through the provided Python bindings.

We note that the score of two resonance structures will be identical, whereas the formal charges and bond orders will not, even though each resonance state is a valid, and minimum score, formal charge and bond order assignment. As the reference data only contains formal charge and bond order assignments for a single resonance structure, the FPT algorithm was modified to be capable of producing all possible resonance structures for a given molecule. However, due to the combinatorial explosion possible when there are multiple, disjoint resonance substructures within a molecule, and the resulting increase in intermediate computational load such an explosion would have on the algorithm, an upper limit to the number of resonance structures obtained of 32 was implemented.

For the comparisons to the reference data sets, a calculated formal charge and bond order assignment, from either the FPT algorithm described here or the A* algorithm described by Dehof et al., is deemed correct if one of the up to 32 assignments returned by the algorithm exactly matches that of the reference data. In some cases, there will be more than 32 potential resonance structures. If the reference assignment is not matched within these first 32 results returned, that molecule is regarded as failing for the algorithm, regardless of whether or not the returned assignments are correct resonance structures for the reference assignment. The results of these comparisons are given in Table  1 .

Scores for our FPT algorithm were derived from quantum chemical calculations using the def2-SVPD or def2-TZVPPD basis set, with or without correction for basis set superposition error. All four sources of scores performed identically and as such, the cheapest level of theory is recommended, and used for the results presented here. Additionally, the scores associated with a \(\hbox {C}^0\) atom and a \(\hbox {C}^{-}\) atom were swapped so as to make a neutral carbon always be more favourable than a charged carbon. This increased the overall accuracy from 95.63 to 97.63%.

These results show that our method has better accuracy than that of Dehof et al. across both databases. Our accuracy is similar accuracy to that of other state of the art bond order assignment methodologies, [ 26 , 27 , 28 , 29 ] while additionally assigning formal charges. We note that because Dehof et al.’s method is not designed for determining formal charges as well as bond orders, the accuracy values that we report for their algorithm should be taken as an upper limit of accuracy, as only correct assignment of bond orders are checked. Any check of formal charge correctness, for example through back calculation from the bond orders, will not be able to exceed these accuracy levels, as the bond order checking is a subset of the combined bond order and formal charge checking.

The method presented by Dehof et al. makes use of arbitrary, but empirically optimised, penalty scores for their bond order assignment, whereas the method presented here utilises scores derived directly from high-level quantum chemical calculations, other than the single swap of the \(\hbox {C}^0\) and \(\hbox {C}^{-}\) scores. This direct derivation makes our scoring function easily extensible to other atom and bond types. Empirical optimisation of the scores used here could increase the accuracy of our algorithm, but at the expense of ease of extensibility.

The 151 molecules for which the FPT algorithm failed to generate a correct assignment include two major groups of chemically similar molecules. The first group consists of 103 molecules containing at least one nitrogen atom assigned a formal charge of \(-\,1\) in the reference data. Overall, 106 such molecules are found in the reference data, showing that only three were correctly assigned by our algorithm. The formal charges and bond orders of nearly all of these molecules were not correctly assigned, indicating that the relative scores for neutral and negatively-charged nitrogen atoms, combined with the scores for single and double bonds involving nitrogen, are unable to produce correct assignments. These nitrogen atoms are generally located in groups such azides or diazos. These groups are generally presented as containing nitrogen atoms with positive and a negative charge adjacent to one another, whereas our algorithm assigns them both neutral charges. Additionally, they are joined by a double bond in the reference data but by a single bond in our results. This illustrates how our scoring system leads to a preference for fewer formal charges.

The second group comprise molecules where some of our underlying assumptions do not hold. For example, for protonated acetone, the MMFF94 database assigns the oxygen atom a formal charge of \(+\,1\) and places a double bond between the oxygen and central carbon atom, whereas our algorithm assigns a formal charge of \(+\,1\) to the central carbon atom and assumes that an oxygen atom bonded to two other atoms will have two lone pairs of electrons. Along with the backbone sigma bond electron pairs, this means that there are no missing electron pairs to assign, thus leading to the positive formal charge on the central carbon and concomitant single rather than double bond.

We have developed a method for determining optimal bond order and formal charge assignments utilising electron assignment scores derived from atom/ion and bond dissociation energies calculated with high-level quantum chemical methods, and tested their performance using three different optimisation methods—local optimisation, an A* pathfinding algorithm, and an FPT algorithm.

While the FPT algorithm is less efficient than local optimisation, its greater accuracy was considered the more important feature. We found no difference in the accuracy of the FPT algorithm when the electron assignment scores were derived from calculations at difference quantum chemical levels of theory, indicating that extension of the scoring function to additional element and bond types need only consider performing quantum chemical calculations at the lowest level of theory used here.

In comparison with the state of the art method of Dehof et al., the FPT algorithm developed here performs remarkably well, attaining relatively similar accuracy levels. We also show that the scores provided here can be easily optimised in order to increase the accuracy, though doing so will remove the extensibility of scores derived from quantum chemical calculations. Our method is well suited to use in computational chemistry and cheminformatic applications where the user supplies only minimal information, as it requires only atom types and connectivity.

Reed AE, Curtiss LA, Weinhold F (1988) Intermolecular interactions from a natural bond orbital, donor-acceptor viewpoint. Chem Rev 88(6):899–926. https://doi.org/10.1021/cr00088a005

Article   CAS   Google Scholar  

Leach AR, Dolata DP, Prout K (1990) Automated conformational analysis and structure generation: algorithms for molecular perception. J Chem Inf Comput Sci 30(3):316–324. https://doi.org/10.1021/ci00067a017

Article   CAS   PubMed   Google Scholar  

Meng EC, Lewis RA (1991) Determination of molecular topology and atomic hybridization states from heavy atom coordinates. J Comput Chem 12(7):891–898. https://doi.org/10.1002/jcc.540120716

Baber JC, Hodgkin EE (1992) Automatic assignment of chemical connectivity to organic molecules in the cambridge structural database. J Chem Inf Comput Sci 32(5):401–406. https://doi.org/10.1021/ci00009a001

Lang E, von der Lieth C-W, Förster T (1992) Automatic assignment of bond orders based on the analysis of the internal coordinates of molecular structures. Anal Chim Acta 265(2):283–289. https://doi.org/10.1016/0003-2670(92)85034-4

Hendlich M, Rippmann F, Barnickel G (1997) Bali: automatic assignment of bond and atom types for protein ligands in the brookhaven protein databank. J Chem Inf Comput Sci 37(4):774–778. https://doi.org/10.1021/ci9603487

Froeyen M, Herdewijn P (2005) Correct bond order assignment in a molecular framework using integer linear programming with application to molecules where only non-hydrogen atom coordinates are available. J Chem Inf Model 45(5):1267–1274. https://doi.org/10.1021/ci049645z

Labute P (2005) On the perception of molecules from 3D atomic coordinates. J Chem Inf Model 45(2):215–221. https://doi.org/10.1021/ci049915d

Wang J, Wang W, Kollman PA, Case DA (2006) Automatic atom type and bond type perception in molecular mechanical calculations. J Mol Graph Model 25(2):247–260. https://doi.org/10.1016/j.jmgm.2005.12.005

Article   PubMed   Google Scholar  

Dehof AK, Rurainski A, Bui QBA, Böcker S, Lenhof H-P, Hildebrandt A (2011) Automated bond order assignment as an optimization problem. Bioinformatics 27(5):619–625. https://doi.org/10.1093/bioinformatics/btq718

Library The Boost Graph (2002) User guide and reference manual. Addison-Wesley Longman Publishing Co., Inc., Boston

Google Scholar  

Jakob W, Rhinelander J, Moldovan D (2017) pybind11—Seamless operability between C++11 and Python. https://github.com/pybind/pybind11 . Accessed 5 Apr 2018

Piecuch P, Kucharski SA, Kowalski K, Musiał M (2002) Efficient computer implementation of the renormalized coupled-cluster methods: the r-ccsd[t], r-ccsd(t), cr-ccsd[t], and cr-ccsd(t) approaches. Comput Phys Commun 149(2):71–96. https://doi.org/10.1016/S0010-4655(02)00598-2

Piecuch P, Włoch M (2005) Renormalized coupled-cluster methods exploiting left eigenstates of the similarity-transformed hamiltonian. J Chem Phys 123(22):224105–224110. https://doi.org/10.1063/1.2137318

Weigend F, Ahlrichs R (2005) Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to RN: design and assessment of accuracy. Phys Chem Chem Phys 7:3297–3305. https://doi.org/10.1039/B508541A

Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JH, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA (1993) General atomic and molecular electronic structure system. J Comput Chem 14(11):1347–1363. https://doi.org/10.1002/jcc.540141112

Gordon MS, Schmidt MW (2005) Advances in electronic structure theory: GAMESS a decade later. In: Dykstra CE, Frenking G, Kim KS, Scuseria GE (eds) Theory and applications of computational chemistry: the first forty years. Elsevier, Amsterdam, pp 1167–1189

Chapter   Google Scholar  

Møller C, Plesset MS (1934) Note on an approximation treatment for many-electron systems. Phys Rev 46(7):618–622. https://doi.org/10.1103/PhysRev.46.618

Article   Google Scholar  

Boys SF, Bernardi F (1970) The calculation of small molecular interactions by the differences of separate total energies. Some procedures with reduced errors. Mol Phys 19(4):553–566. https://doi.org/10.1080/00268977000101561

Hart PE, Nilsson NJ, Raphael B (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE Trans Syst Sci Cybern 4(2):100–107. https://doi.org/10.1109/TSSC.1968.300136

Bodlaender HL, Koster AMCA (2010) Treewidth computations i. Upper bounds. Inf Comput 208(3):259–275. https://doi.org/10.1016/j.ic.2009.03.008

Halgren TA (1996) Merck molecular force field. i. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17(5–6):490–519. https://doi.org/10.1002/(SICI)1096-987X(199604)17:5/6<490::AID-JCC1>3.0.CO;2-P

Goto S, Okuno Y, Hattori M, Nishioka T, Kanehisa M (2002) Ligand: database of chemical compounds and reactions in biological pathways. Nucleic Acids Res 30(1):402–404. https://doi.org/10.1093/nar/30.1.402

Article   CAS   PubMed   PubMed Central   Google Scholar  

Hattori M, Okuno Y, Goto S, Kanehisa M (2003) Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc 125(39):11853–11865. https://doi.org/10.1021/ja036030u PMID: 14505407

Hildebrandt A, Dehof AK, Rurainski A, Bertsch A, Schumann M, Toussaint NC, Moll A, Stöckel D, Nickels S, Mueller SC, Lenhof H-P, Kohlbacher O (2010) Ball-biochemical algorithms library 1.3. BMC Bioinf 11(1):531. https://doi.org/10.1186/1471-2105-11-531

Zhang Q, Zhang W, Li Y, Wang J, Zhang L, Hou T (2012) A rule-based algorithm for automatic bond type perception. J Cheminf 4(1):26. https://doi.org/10.1186/1758-2946-4-26

Urbaczek S, Kolodzik A, Groth I, Heuser S, Rarey M (2013) Reading pdb: perception of molecules from 3D atomic coordinates. J Chem Inf Model 53(1):76–87. https://doi.org/10.1021/ci300358c PMID: 23176552

Zhao Y, Cheng T, Wang R (2007) Automatic perception of organic molecules based on essential structural information. J Chem Inf Model 47(4):1379–1385. https://doi.org/10.1021/ci700028w PMID: 17530839

Kadukova M, Grudinin S (2016) Knodle: a support vector machines-based automatic perception of organic molecules from 3D coordinates. J Chem Inf Model 56(8):1410–1419. https://doi.org/10.1021/acs.jcim.5b00512 PMID: 27405533

Download references

Authors' contributions

IDW designed and implemented the algorithms, and analysed the results. JRA aided in analysis of the results and preparation of the manuscript. All authors read and approved the final manuscript.

Acknowledgements

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

All data generated or analysed during this study are included in this published article and its supplementary information files. Data in a computer readable format is available from the Github repository at https://git.io/vH7hz .

This work was supported by a Marsden Fast Start Grant (13-MAU-039) and a Rutherford Discovery Fellowship (15-MAU-001) awarded to JRA.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and affiliations.

Centre for Theoretical Chemistry and Physics, Institute of Natural and Mathematical Sciences, Massey Unversity, Private Bag 102904, Auckland, New Zealand

Ivan D. Welsh & Jane R. Allison

School of Biological Sciences, University of Auckland, Private Bag 92019, Auckland, New Zealand

BiomolecularInteraction Centre, University of Canterbury, Private Bag 4800, Christchurch, New Zealand

Jane R. Allison

Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Private Bag 92019, Auckland, New Zealand

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Jane R. Allison .

Additional file

Additional file 1..

Calculated atom and bond score tables, and canonical SMILES strings of the test molecules.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Cite this article.

Welsh, I.D., Allison, J.R. Automated simultaneous assignment of bond orders and formal charges. J Cheminform 11 , 18 (2019). https://doi.org/10.1186/s13321-019-0340-0

Download citation

Received : 12 August 2018

Accepted : 26 February 2019

Published : 06 March 2019

DOI : https://doi.org/10.1186/s13321-019-0340-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Journal of Cheminformatics

ISSN: 1758-2946

assign bond order

Search This Blog

Practical cheminformatics, assigning bond orders to pdb ligands - the easy way.

assign bond order

This comment has been removed by a blog administrator.

Post a Comment

Popular posts from this blog, ai in drug discovery 2022 - a highly opinionated literature review, generative molecular design - we need to raise the bar, mining ring systems in molecules for fun and profit.

Image

Error thrown

Call to undefined function monsterinsights_get_url()

Article Contents

1 introduction, 2 scoring bond order assignments, 4 discussion, 5 conclusion, acknowledgement.

Automated bond order assignment as an optimization problem

Associate Editor: Burkhard Rost

Anna Katharina Dehof, Alexander Rurainski, Quang Bao Anh Bui, Sebastian Böcker, Hans-Peter Lenhof, Andreas Hildebrandt, Automated bond order assignment as an optimization problem, Bioinformatics , Volume 27, Issue 5, March 2011, Pages 619–625, https://doi.org/10.1093/bioinformatics/btq718

Motivation: Numerous applications in Computational Biology process molecular structures and hence strongly rely not only on correct atomic coordinates but also on correct bond order information. For proteins and nucleic acids, bond orders can be easily deduced but this does not hold for other types of molecules like ligands. For ligands, bond order information is not always provided in molecular databases and thus a variety of approaches tackling this problem have been developed. In this work, we extend an ansatz proposed by Wang et al. that assigns connectivity-based penalty scores and tries to heuristically approximate its optimum. In this work, we present three efficient and exact solvers for the problem replacing the heuristic approximation scheme of the original approach: an A*, an ILP and an fixed-parameter approach (FPT) approach.

Results: We implemented and evaluated the original implementation, our A*, ILP and FPT formulation on the MMFF94 validation suite and the KEGG Drug database. We show the benefit of computing exact solutions of the penalty minimization problem and the additional gain when computing all optimal (or even suboptimal) solutions. We close with a detailed comparison of our methods.

Availability: The A* and ILP solution are integrated into the open-source C++ LGPL library BALL and the molecular visualization and modelling tool BALLView and can be downloaded from our homepage www.ball-project.org . The FPT implementation can be downloaded from http://bio.informatik.uni-jena.de/software/ .

Contact:   [email protected]

Supplementary information:   Supplementary data are available at Bioinformatics online.

Correct bond order information is essential for many algorithms in Computational Structural Biology and Theoretical Chemistry, since bonds do not only define the connectivity of atoms in a molecule but also define structural aspects like rotatability of individual parts. However, bond order information can often not be directly inferred from the available experimental data. Even important molecular databases, like the Protein Data Bank (PDB) ( Berman et al. , 2003 ) and the Cambridge Structural Database ( Allen, 2002 ), are known to contain erroneous data for connectivity and bond order information ( Labute, 2005 ) or to even omit them entirely. For proteins and nucleic acids, bond orders can be easily deduced due to their building block nature, but this does not hold for other kinds of molecules like ligands. The problem is made much worse by the fact that quite often, the bond order assignment for a given molecule is not unique, even when neglecting symmetries in the molecule.

The chemical reasons for this effect are complex and out of scope of this work; here we just want to state that the concept of integer bond orders is only an approximation to a full quantum chemical treatment, and cannot explain all effects occurring in molecules. Important examples are aromatic or delocalized bonds, leading to different resonance structures (cf. Fig. 1 ). In addition, formal charges are often not contained in the input files, but atoms carrying a formal charge will obviously show a different bonding pattern.

Top: different co-optimal bond order assignments. In the left structure, both NH2 groups are connected via a single bond and thus freely rotatable. In the middle and right structure, one group is connected via a double bond. Bottom: heuristic and optimal bond order assignments. The left structure is the solution provided by Antechamber with a tps of 4. Its 5-ring does not fulfill the aromaticity criterion of AM1-BCC (Jakalian et al., 2002). The right structure is the solution computed with our exact solvers. Its tps is 0 and the 5-ring meets the AM1-BCC aromaticity criterion.

Top: different co-optimal bond order assignments. In the left structure, both NH 2 groups are connected via a single bond and thus freely rotatable. In the middle and right structure, one group is connected via a double bond. Bottom: heuristic and optimal bond order assignments. The left structure is the solution provided by Antechamber with a tps of 4. Its 5-ring does not fulfill the aromaticity criterion of AM1-BCC ( Jakalian et al. , 2002 ). The right structure is the solution computed with our exact solvers. Its tps is 0 and the 5-ring meets the AM1-BCC aromaticity criterion.

One body of opinion tries to overcome these obstacles by hand curation, which clearly provides the highest reliability. On the other hand, manual data curation does not scale well to large numbers of molecules, and it does not help in conditions where modifications are systematically applied to molecules, e.g. in computational combinatorial chemistry.

In the past decades, the problem of assigning bond orders automatically has been addressed by a number of different approaches. Early methods in the field strongly rely on the correctness of atomic coordinates and focus on reference bond lengths and valence angles ( Baber and Hodgkin, 1992 ) or additionally consider functional group detection ( Hendlich et al. , 1997 ) and further molecular features like hybridization states and charges ( Lang et al. , 1992 ; van Aalten et al. , 1996 ; Zhao et al. , 2007 ). The main drawbacks of those approaches are the dependence on correct atomic coordinates and the algorithms' heuristic nature.

In contrast, exact solvers proposed previously represent the bond order assignment problem as a Maximum Weighted Matching for non-bipartite graphs ( Labute, 2005 ) or as an integer linear programming problem that generates valid Lewis structures (electron dot structures) with minimal formal charge on each atom ( Froeyen and Herdewijn, 2005 ).

Recently, Wang et al. (2006 ) have presented an elegant novel approach to the problem, which is implemented in the established Antechamber package, a suite of tools used for the preparation of input structures for molecular mechanics studies. In this approach, a chemically motivated, expert generated penalty function is used to score bond order assignments. This function is then heuristically optimized. However, this procedure has two drawbacks: the score of a resulting assignment is not guaranteed to be optimal and the algorithm provides only one solution while there can be more than one assignment with optimal score. Figure 1 exemplarily shows two cases where theses two drawbacks may lead to a wrong interpretation of molecular properties (flexibility and aromaticity). Unfortunately, the optimization problem introduced by Wang et al. (2006 ) is NP-hard ( Böcker et al. , 2009 ). In this work, we propose three exact approaches that solve the problem to provable global optimality by discrete optimization techniques: we give an integer linear program (IPL) formulation, and an A* approach and a fixed-parameter algorithm for enumerating all optimal or, if desired, all feasible solutions. In addition, our approach can easily be extended to, e.g. include structural information, add missing hydrogens or even missing bonds. The A* and ILP solution are integrated into the opensource C++ LGPL library BALL ( Kohlbacher and Lenhof, 2000 ) ( Hildebrandt et al. , 2010 ) and the molecular visualization and modelling tool BALLView ( Moll et al. , 2006 ) and can be downloaded from our homepage www.ball-project.org . The FPT implementation can be downloaded from http://bio.informatik.unijena.de/software/ .

The idea behind the bond order assignment algorithm proposed in Wang et al. (2006 ) is to cast the problem into a discrete optimization problem. Finding the most probable consistent bond order assignment for a given molecule is addressed by minimizing a total penalty score tps , where each atom is assigned an atomic valence av that is defined as the sum over all bond orders bo of all bonds connected to the atom under consideration: av = ∑ i =1 con bo i Here, con denotes the number of bonded atoms. The distance of the calculated av to the atom's most desirable valence value is measured by the atomic penalty score aps : the possible valences of an atom and the corresponding distance penalty scores are stored in a penalty table that uses a rule-based atom type classification and was derived by Wang et al. (2006 ). The sum over all atomic penalty scores of a molecule now yields the total penalty score tps = ∑ i =1 n aps i where n denotes the number of atoms. The smaller the tps of a given bond order assignment, the more reasonable it is. Unfortunately, this problem is NP-hard ( Böcker et al. , 2009 ). In Wang et al. (2006 ), minimization now proceeds in a heuristic and greedy manner.

3.1 Integer linear program (ILP)

To compute a bond order assignment with guaranteed globally minimal tps , we formulated the problem as an ILP ( Papadimitriou and Steiglitz, 1998 ) as described below.

A is the set of all atoms of the molecule under consideration.

B ( a ) is the set of bonds of atom a ∈ A and B denotes the set of all bonds of the molecule.

V ( a ) ⊂ ℕ contains the possible valences of atom a ∈ A .

P ( a , v ) is the penalty value for atom a ∈ A and valence v ∈ V ( a ).

Our approach uses two different classes of variables. For each bond b ∈ B , we introduce indicator variables x b i ∈ {0, 1} symbolizing whether the corresponding bond b is set to the bond order i ∈ {1,…, μ}, where μ is the maximum bond order considered (in the following, we will set μ to 3, allowing single, double and triple bonds). To ensure that each bond b ∈ B is assigned exactly one bond order, we add the linear constraints ∑ i =1 μ x b i = 1 for all b ∈ B . Then the sum over all bond orders of all bonds b ∈ B ( a ) can be computed as ∑ b ∈ B ( a ) (∑ i =1 μ x b i · i ).

For the solution of ILPs to provable global optimality, several strategies can be chosen, like the popular pure branch and bound approaches or branch and cut methods ( Papadimitriou and Steiglitz, 1998 ). We employed the open source solver lp_solve ( http://lpsolve.sourceforge.net ), which uses a simplex algorithm-based branch and bound approach ( Papadimitriou and Steiglitz, 1998 ). In our experiments, we have seen a drastic increase in running time if more than one solution is computed. Thus, the ILP approach is not well suited for obtaining co-optimal or suboptimal bond order assignments.

3.2 The A* approach

In order to be able to efficiently enumerate all feasible solutions—optimal and suboptimal ones—we formulated the bond order assignment problem as an A* search algorithm. This allows enumeration of all assignments in the order of increasing penalty and hence, for instance, to compare the assignments of all solutions for a given molecule up to a user-defined penalty threshold. In addition, such an A* algorithm is simpler to implement, and often easier to extend, than an ILP approach; for instance, it is easily possible to influence the order in which solutions with equal score are computed.

As a combinatorial optimization problem, the bond order assignment problem can be represented by a tree, where each layer stands for one of the decisions that have to be made. In our case, the tree has k layers, where k is the number of bonds that have to be assigned. A node at layer i has μ children, where μ is the number of possible bond orders, typically 3, and each edge is labeled with its corresponding order. Hence, by tracing the path from the root to a node w at layer i , we can determine the values of the first i bonds in this particular partial assignment represented by the node w . Thus, the root node corresponds to a completely unassigned molecule with only unknown bond orders, while the leave nodes correspond to complete bond order assignments. If we only add child nodes and if the resulting valence state is valid, the leaf nodes correspond to the feasible bond order combinations. In order to discriminate between the different combinations, each leaf is assigned its total penalty score.

Visiting all nodes in the tree, the optimal bond order assignment can be found in a brute-force manner with exponential running time. If, additionally, all intermediate nodes are assigned the atomic penalty score of the partial bond order assignment they represent, a greedy search will yield an assignment with heuristically good (but not necessary optimal) total penalty score in linear running time. It can be shown that, if at each intermediate node more information is provided, finding an optimal solution can be guaranteed with greatly improved expected running time. This leads to the popular A*-search algorithm ( Hart et al. , 1968 ), which employs a search heuristic to guide the algorithm in descending the tree. More formally, the algorithm associates with each node w a function f ( w ) = g *( w ) + h *( w ), where g *( w ) describes the score corresponding to the decisions already made and h *( w ) is the so-called search heuristic. For the purposes of the A *-search algorithm, the search heuristic must be an admissible estimate of the score of the best leaf that can be reached starting from node w and descending further down the tree. Here, admissible means that it needs to be ‘optimistic’: for all nodes w , the estimated cost h *( w ) may never be greater than the lowest real cost to reach a goal node. Given the additional information provided by h *, the A*-search algorithm always expands one of the nodes with the most promising score, ensuring that the first leaf reached is optimal (roughly speaking, if the algorithm would visit a leaf with worse score first, the search-heuristic would have overestimated the penalty of the real optimal solution, which an admissible heuristic never does).

The function g * sums the atomic penalties of all completely assigned atoms in the partial bond order assignment represented by node w , whereas h * considers all atoms with at least one bond of unassigned bond order. For the atoms in this set, we compute the minimal atomic penalty possible under the current partial assignment independently of the other atoms in the set: each atom can choose its preferred value for each unassigned bond without considering overall consistency. Obviously, h * is optimistic. All three heuristics are implemented in our code.

3.3 The fixed-parameter approach (FPT)

In this approach, we consider each molecule as a molecule graph G = ( U , E ), where each vertex represents an atom and each edge represents a bond. Given a molecule graph that is a tree, the bond order assignment problem can be solved in polynomial time using dynamic programming. We omit the details, and concentrate on the more general case of graphs that are ‘tree-like’: A tree decomposition of G =( U , E ) consists of an index set I , a set of bags X i ⊆ U for i ∈ I and a tree T with node set I such that: The width of this tree decomposition equals ω−1 for ω≔max{∣ X i ∣ ∣ i ∈ I } − 1. The treewidth of G is the minimum width of any tree decomposition of G . The treewidth of a tree equals one.

every vertex u ∈ U is contained in at least one bag X i ;

for every edge { u , v } ∈ E , there is at least one bag X i such that u , v ∈ X i ;

for two nodes i , k of the tree T , if u ∈ X i and u ∈ X k then u ∈ X j also holds for all nodes j of the tree along the path from i to k in T .

Given a molecule graph G , we first compute a tree decomposition of G . We will see below that the running time and the required space of our algorithm grow exponentially with the width of the decomposition. Unfortunately, computing a tree decomposition with minimum width is again an NP-hard problem ( Arnborg et al. , 1987 ). Fortunately, there exist heuristic and exact algorithms to compute such tree decompositions efficiently in practice ( Bodlaender et al. , 2006 ; Gogate and Dechter, 2004 ).

To simplify the description of our algorithm, we use nice tree decompositions: Here, we assume the tree T to be rooted. A nice tree decomposition is a tree decomposition satisfying: Here, introduce nodes and forget nodes are viewed as moving bottom-up from the leaves to the root. We can easily transform a tree decomposition into a nice tree decomposition, in time linear in the size of the tree decomposition.

Every node of T has at most two children.

If a node i has two children j and k , then X i = X j = X k ; in this case, i is called a join node .

| X i | = | X j | + 1 and X j ⊂ X i ; in this case X i is called an introduce node .

| X i | = | X j | − 1 and X i ⊂ X j ; in this case X i is called a forget node .

Figure 2 illustrates a tree decomposition and a corresponding nice tree decomposition of a graph. It can be easily verified that the union of all bags in the tree decomposition as well as all bags in the nice tree decomposition contains every vertex of the graph, and every edge of the graph exists in at least one bag of the tree decompositions. Furthermore, all bags sharing a common vertex induce a connected subgraph in the tree decomposition.

A graph (top left) with a tree decomposition (bottom left) and a corresponding nice tree decomposition (right). Dashed lines illustrate connected components sharing common vertices.

A graph (top left) with a tree decomposition (bottom left) and a corresponding nice tree decomposition (right). Dashed lines illustrate connected components sharing common vertices.

The tree T is rooted at an arbitrary bag. Above this root, we add additional forget nodes, such that the new root contains a single vertex. Let X r denote the new root of the tree decomposition and v r denote the single vertex contained in X r . Analogously, we add additional introduce nodes under every leaf of T , such that the new leaf also contains a single vertex. Let X i = { a i ,1 , a i ,2 ,…, a i , k } be the atoms inside bag X i , where k ≤ ω. In our presentation below, we want to avoid double indices, so we refer to the atoms inside bag X i as a 1 , a 2 ,…, a k . It should be understood that these are different atoms for each bag. For simplicity of presentation, we also assume that the molecular subgraph induced by a 1 , a 2 ,…, a k is fully connected and, thus, contains all bonds a 1 a 2 , a 1 a 3 ,…, a k −1 a k .

Our algorithm begins at the leaves of the tree decomposition and computes the score matrix D i for every node X i when score matrices of its children nodes have been computed. We initialize the matrix D j of each leaf X j = { a 1 } with D j [ v 1 ; ·]=0 if v 1 = 0, and D j [ v 1 ; ·] = ∞ otherwise. During the bottom-up traversal, the algorithm distinguishes if X i is a forget node, an introduce node or a join node, and computes D i as follows:

We implemented our algorithm in Java and used the method QuickBB in the library LibTW implemented by van Dijk et al. ( http://www.treewidth.com ) to compute the optimal tree decomposition of a molecule graph. After computing the optimal tree decomposition, we transformed it into a nice tree decomposition. Running times reported for the fixed-parameter approach (FPT) algorithm include the running times of computing the optimal nice tree decomposition. To save memory, we use hash maps instead of arrays to implement score matrices D . During the course of the dynamic programming algorithm, we do not have to compute or store entries D j [ v 1 ,…, v k ; b 1,2 ,…, b k −1, k ] with v l + ∑ j b l , j > max V ( a l ) for some l , because such entries will never lead to a feasible bond order assignment. Furthermore, we find that the following trick speeds up our algorithm in practice: we initialize an integer k = 0 and do not store matrix entries with score exceeding k . If the score of the optimal solution is at most k , this optimal solution will be found. Otherwise, we call our algorithm again with increasing k , until an optimal solution is found. If not only the optimal solutions but also a certain number of suboptimal solutions are required, we call our algorithm repeatedly with increasing k , until all required suboptimal solutions are found or k arrives its upperbound ∑ a ∈ A max v ∈ V ( a ) P ( a , v ). Further details can be found in ( Böcker et al. , 2009 ).

For proteins and DNA, bond orders can be simply inferred by matching the given state to a database containing the bond orders for all amino acids and nucleotides. Hence, we focus the evaluation of our algorithms on small and medium-sized molecules (for instance, drug-like molecules). Such molecules can be found in large numbers in several established ligand databases, such as ZINC ( Irwin and Shoichet, 2005 ), ASTEX ( Nissink et al. , 2002 ), KEGG Ligand and KEGG Drug ( Goto et al. , 2002 ), the MMFF94 validation suite ( Halgren, 1996 ) or the Cambridge Structural Database ( Allen, 2002 ). But evaluating the correctness of our bond order algorithms poses certain constraints: we need ligand structures that contain not only the connectivity information but also preassigned bond orders and explicit hydrogens. As a further requirement, aromatic bonds should be given in kekulized form, i.e. replaced by a suitable pattern of single and double bonds. In contrast to structure-based bond assignment approaches, however, we can use databases that contain 3D ligand structures as well as those only storing structure diagrams or SMILES expressions. To provide a diverse test dataset fulfilling those constraints, we chose the MMFF94 validation suite and the KEGG Drug Database to provide our ground truth.

The MMFF94 validation suite ( Halgren, 1996 ) provides 761 small drug-like molecules, mainly derived from the Cambridge Structural Database ( Allen, 2002 ). The molecules were thoroughly prepared by the authors of the MMFF94 force field by assigning bond orders, adding hydrogens where valences had to be completed, and minimizing the resulting complexes. The MMFF94 validation suite was originally designed to test the MMFF94 force field parameters, and thus yields a diverse set of molecules with hand-curated connectivity information, hydrogens and bond order assignment and 3D positions that we found very reasonable for testing bond order perception.

The KEGG Drug Database ( Goto et al. , 2002 ), provided by the Kanehisa Laboratories, offers a remarkable number of drug-like molecules for diverse applications in bioinformatics. The atoms molecular coordinates are 2D, which is suitable for representation by structure diagrams, but is unsuited for structure-based bond order assignment as performed by most of the former approaches. It thus represents a perfect test scenario for bond order assignment from topology alone. Unfortunately, hydrogens are missing in the KEGG data bases, and were added for our tests using standard rules for completing free valences as performed by OpenBabel ( Guha et al. , 2006 ). Furthermore, 2550 files of the KEGG Drug set contain more than one molecule, and each molecule may appear in more than one file. To prevent a skewed analysis, we split up the dataset into unique connected components. Ignoring molecules with less than four atoms (e.g. water), this preparation led to a test set of 7424 molecules from the KEGG drug set.

In Section 4.1 , we compare the total penalty score tps of the results of our exact solvers with that of the results of the original Antechamber approach. In Section 4.2 , we compare the results of the different approaches to the expert generated, hand-crafted reference assignments and study the implications of the ambiguity of two or more co-optimal solutions.

All algorithms—A*, ILP, FPT and Antechamber—are applied to the two test sets such as MMFF94 and KEGG Drug. Computing all optimal solutions for all 761 molecules of the MMFF94 dataset, the total running time was 252.0 s for the ILP, 227.1 s for the A* algorithm and 24.9 s for the FPT algorithm. The antechamber heuristic took 7.9 s to compute one solution for all molecules (cf. Supplementary Material). All reported running times were averaged over 20 repetitions. Thus, the ability to provide all optimal exact solutions and to use user-editable SMARTS strings for penalty class assignment takes its toll: the heuristic antechamber approach is the fastest of the methods, about an order of magnitude faster than ILP and A*. Still, all running times are sufficiently small to allow the routine usage in high-throughput applications.

4.1 Comparison to Antechamber

In order to evaluate whether solving the optimization exactly makes a difference in practice, we first focus on the following properties:

how often do manual, heuristic and exact approaches produce an optimally scored solution;

how often do the exact approaches find a solution with a smaller tps than the heuristic;

how often does each approach fail to find a feasible solution.

Evaluation on the MMFF94 validation suite (761 molecules in total) was done as follows: the Antechamber bond perception algorithm as well as our own algorithms—A*, ILP and FPT—were run for each input molecule. Note that all exact algorithms will in principle compute the same solutions, and only the order of co-optimal solutions can differ. If both Antechamber and our algorithms computed bond order assignments (i.e. none of the approaches failed), we compared these to test if the Antechamber assignment is optimal.

For 734 molecules (96.45%), the solution found by the heuristic Antechamber approach is optimal. For nine molecules (1.18%), the exact algorithms indeed find bond order assignments with a total penalty score less than that of the solution provided by Antechamber (cf. the Supplementary Material). For 14 cases (1.83%), our algorithms computed an optimal bond order assignment, whereas the heuristic Antechamber bailed out. In four cases (0.53%), neither Antechamber nor our algorithms computed a bond order assignment, due to missing atom types in the penalty table. In no case, Antechamber computed a solution but our algorithms did not. In total, Antechamber bailed out in 18 cases (2.30%), and in 23 cases (3.02%) we improved upon Antechamber (no solution by Antechamber or better solution by our algorithms).

The comparison of our algorithms to the Antechamber approach on the KEGG Drug set (7424 molecules in total) looks very similar. For 7202 molecules (97.01%), the bond order assignment found by Antechamber is optimal. For 13 molecules (0.18%) containing PO 4 , Antechamber reproducibly provided infeasible solutions, whereas our algorithms computed optimal assignments. For 27 cases (0.36%), our algorithms computed an optimal assignment but Antechamber bailed out. In 180 cases (2.42%), both approaches bailed out, as not all atom types are contained in the original penalty table given in Wang et al. (2006 ). In total, Antechamber bailed out in 207 cases (2.79%), and we improved upon Antechamber in 40 cases (0.54%).

A complete comparison for both test sets is given in Table 1 . Please note that the test datasets were chosen such that they are relatively wellsuited to the heuristic Antechamber approach, e.g. they contain relatively few large or complex ring systems.

Comparison of our exact solvers with the original heuristic implementation of Antechamber and the expert generated solutions (‘reference solution’) for the molecules of the MMFF94 validation set and the KEGG Drug set

4.2 Comparison to reference assignments

As a second step in the analysis, we compare the results produced by all approaches to the reference assignment. For our own solvers, which are able to enumerate all optimal (FTP, ILP) or even all feasible solutions (A*), we only recorded the first one.

As can be seen in Table 2 , our methods are able to significantly reproduce more bond order assignments of the MMFF94 validation suite than the original Antechamber approach. While Antechamber correctly recomputed 37.05% of the molecules, the exact methods reconstructed between 53.88% and 61.89% of the reference bond order assignments as the first solution. Similar results can be seen on the KEGG Drug set: Antechamber correctly reproduced 41.96% of the bond order assignments, compared to 49.95–56.9% for the exact methods. Obviously, all results returned by the exact solvers are optimal and hence, the differences in these numbers are due to systematic differences in the order in which each algorithm enumerates the solutions. In the case of the A* algorithm, this order can easily be tweaked by adapting the heuristic part of the scoring functions. By design, our A* heuristics tend to avoid the occurrences of larger bond orders, but this strategy could be further fine tuned. Note that the FPT algorithm can easily be modified to simulate this behaviour, as computing all optimal solutions does not significantly increase running times. For the ILP approach, in contrast, running times would increase considerably. In future, we plan to sort co-optimal solutions with respect to another objective function before writing the output. This might possibly further increase the quality of our results, and is the topic of ongoing research.

Performance of the original Antechamber implementation and our exact solvers on the test sets using the penalty table as defined in Wang et al. (2006 )

The third column denotes the number of molecules for which the algorithms return the original bond order assignment as first solution, the fourth column the number of molecules for which the algorithms return it at as any of their optimal solutions.

Considering that bond order assignments need not be unique, it makes sense to provide the user not only with the first solution but with all optimal ones (or even some suboptimal ones). In this case, taking all optimal solutions into account, we find that our algorithms find the reference solution in 78.71% of the cases on the MMFF94 validation suite and in 85.21% on the KEGG Drug set. A complete comparison is given in Table 2 .

Obviously, the performance of all approaches is limited by the quality of the penalty table: the definition of the atom classes, their allowed valence states, and the choice of the valence state's penalties have a significant influence on the performance. As can be seen in Table 1 , the current penalty table does not cover all molecules in the reference datasets—for four molecules in the MMFF94 set and for 180 in the KEGG set, the required atom classes are missing. Hence, in our own implementations, we use SMARTS expressions stored in an XML file to define the penalty classes, which allows a user to easily add atom types or tune the results to his needs. To guarantee a fair comparison between the solvers, we ensured that for all tests in this article, our implementation used exactly the same penalty classes as Antechamber. Improvements to the penalty table, and a systematic study of their influence, are the focus of future work.

Automated bond order assignment is an important problem when working with user-generated molecules, molecular databases or computational combinatorial chemistry. Especially fully automated pipelines in high-throughput applications depend on reliable bond order assignments. The modern and extensible approach realized in Antechamber is based on sound chemical principles and has proven to be a very valuable tool. In this work, we have shown three different exact solvers as alternatives to the heuristic approach pursued by Wang et al. (2006 ): an A* algorithm, an ILP formulation and a fixed parameter approach. While we found in our evaluations that the heuristic solver works surprisingly well—roughly 97% of all cases in our tests—it still can be significantly improved using exact techniques. If we keep in mind that bond order assignments are in many cases non-unique—different resonance structures, for instance, might have the same probability to occur—the ability to systematically enumerate all solutions becomes an invaluable tool. When bond order assignments are important, it might be worthwhile to enumerate all optimal assignments, run whatever procedure is supposed to work with the results in the next step, and average over the results.

Comparing the three different exact strategies, each of them has its advantages and disadvantages. If computational efficiency is required, the best choice is clearly the FPT, where running times are almost on par with the Antechamber heuristic. The A* algorithm, on the other hand, is even simpler to implement than the heuristic and can be very easily extended through the heuristic cost function. Both approaches can compute co-optimal and sub-optimal solutions without significantly increasing running times, and geometric information can be employed to provide a more sensible ordering of the results. The ILP approach, finally, is trivial to implement when external solvers can be used. However, enumerating all solutions requires a certain sophistication and can easily spoil the running time. An additional advantage of our methods is their easy extensibility. For example, adding missing hydrogens or even bonds is possible but will require more elaborate, e.g. structure based, scoring to handle the exponential number of combinations. Such a scoring scheme only requires modifications of the tps definition. Algorithmically, the bond order assignment problem bears close resemblance to the side chain optimization problem, where similar solution strategies have been developed [ Althaus et al. (2002 ); Leach and Lemon (1998 ); Xu et al. (2005 )]. Future work will study whether modern probabilistic approaches [see, e.g. Yanover et al. (2008 )] for this problem will also be appropriate for bond order assignment.

Implementation of the FPT algorithm was done by Kai Dührkop and Patrik Seeber.

Funding : A.H. acknowledges financial support from the Intel Visual Computing Institute (IVCI) of Saarland University; A.H. and H.P.L. financial support from DFG (BIZ4:1-4). Q.B.A.B. financial support from DFG, research group ‘Parameterized Algorithms in Bioinformatics’ (BO 1910/5).

Conflict of Interest : none declared.

Google Scholar

Google Preview

Author notes

Supplementary data, email alerts, citing articles via, looking for your next opportunity.

Affiliations

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Knowledge Base

Is it possible to assign bond orders to pdb ligands from the command line.

Yes, with the following command:

$SCHRODINGER/run fix_bond_orders.py input-file output-file

where the input and output files can be in Maestro, PDB, or SD format. This command assigns bond orders (single, double, triple, or zero-order) in the same way as selecting "Assign bond orders" in the Protein Preparation Wizard panel in Maestro, including zero-order bonds to metals.

To ask a question or get help, please submit a support ticket or email us at [email protected] .

Assign Bond Orders

Backend implementation, input ports, output ports, popular predecessors.

Popular Successors

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Schrödinger Extensions for KNIME from the below update site following our NodePit Product and Node Installation Guide :

A zipped version of the software site can be downloaded here .

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to [email protected] , follow @NodePit on Twitter, or chat on Gitter !

Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.

Bond Orders

If the PDB file contains residues, then bond orders are set according to the standard form of the amino acid and the atom names in the PDB file.

For any other molecule (or file type), bond orders are set based on "perceiving" bond orders. The algorithm is based on a talk by Roger Sayle which has proven to be one of the most reliable methods for assigning bond orders.

Loosely speaking, the steps are something like this:

Navigation menu

Personal tools.

IMAGES

  1. Determining Bond Order

    assign bond order

  2. Bond order Application

    assign bond order

  3. Bond Order

    assign bond order

  4. Lecture 15 (6 of 6)

    assign bond order

  5. Calculation of bond order

    assign bond order

  6. Bond Order: Definition, Calculation and Significance

    assign bond order

VIDEO

  1. ALL TRICKS TO FIND BOND ORDER

  2. Bond order animation

  3. Trick To Find Out Bond Order #youtube #shorts #chemistry

  4. bond order calculation of molecule

  5. Fundamentals of Valuing Bonds

  6. Acid Rain Keyboard Part

COMMENTS

  1. Bond Order and Lengths

    Bond order is the number of chemical bonds between a pair of atoms and indicates the stability of a bond. For example, in diatomic nitrogen

  2. Automated simultaneous assignment of bond orders and formal

    Minimising some function of the electron positions thus results in an optimal formal charge and bond order assignment. Given that electron

  3. Assigning Bond Orders to PDB Ligands

    Assigning Bond Orders to PDB Ligands - The Easy Way ; def process_ligand · """ ; Add bond orders to a pdb ligand 1. Select the ligand component

  4. Assigning the correct bond order using RDKit.

    For normal sp3 carbons and molecules with mostly single bonds this system works fine, however, for more complex structures containing for

  5. Automated bond order assignment as an optimization problem

    As a combinatorial optimization problem, the bond order assignment problem can be represented by a tree, where each layer stands for one of the decisions that

  6. Is it possible to assign bond orders to PDB ligands ...

    where the input and output files can be in Maestro, PDB, or SD format. This command assigns bond orders (single, double, triple, or zero-order)

  7. Assign Bond Orders

    Assign Bond Orders ... Assigns double and triple bonds to input structures based on molecular geometry (bond length, bond angles, and dihedral angles). Useful

  8. Bond Order and Resonance Structures

    This organic chemistry video tutorial explains how to calculate the bond order of a regular bond and a bond that's part of a resonance

  9. Assign Bond Orders

    The Assign Bond Orders command on the Tools menu assigns bond orders to structures that do not have the correct bond orders. Bond orders are assigned as follows

  10. Bond Orders

    If the PDB file contains residues, then bond orders are set according to the standard form of the amino acid and the atom names in the PDB