lacan.protect
protect.py — Bond protection and SMARTS-based atom exclusion for LACAN.
This module provides two mechanisms:
- SMARTS-based atom exclusion (stateless)
Pass a
protect_smartsSMARTS string to any operation inlacan.replace,lacan.mutate, oroptimize_from_mol(). Atoms matching the pattern are skipped by all mutation and fragment-replacement operations each call. Because the protected set is re-derived from the molecule on every call viaget_protected_atoms(), it survives SMILES round-trips transparently, no atom properties are stored on the molecule.- Bond protection (stateful, stored as
_lpbond property) Protected bonds are excluded from the LACAN score computation. Use this when a molecule contains a required but chemically unusual motif (e.g. a Michael-acceptor covalent warhead) whose LACAN score would unfairly penalise the whole molecule. Bond protection is stored as the
_lpboolean property on RDKit bond objects and is carried through all replacement operations via the internal_restore_bond_protectionhelper inlacan.replace.
All functions return new molecules — the originals are never modified in place.
The mol_cleaner() utility uses bond protection internally to iteratively
repair LACAN-failing molecules while preserving the parts that already pass.
- lacan.protect.PROP = '_lp'
Name of the RDKit bond boolean property used to mark bond protection.
Note: atom protection is no longer stored as a mol property — use
protect_smartsparameters instead (seeget_protected_atoms()).
- lacan.protect.get_protected_atoms(mol, protect_smarts)[source]
Return the set of atom indices matched by protect_smarts in mol.
Call it once at the top of any function that needs to skip protected atoms, then pass the resulting frozenset to the per-atom checks.
- Parameters:
mol (RDKit Mol)
protect_smarts (str or None. If None, returns an empty frozenset.)
- Returns:
Atom indices matched by the SMARTS. Empty frozenset if protect_smarts is
Noneor"".- Return type:
- Raises:
ValueError – If protect_smarts is not
Noneand cannot be parsed.
- lacan.protect.reaction_touches_protected(mol, rxn, protect_smarts)[source]
Return True if any match of rxn overlaps an atom matched by protect_smarts.
Called by
apply_mutations()before each reaction to skip operations that would modify a protected atom. ReturnsFalseimmediately when protect_smarts isNoneor"", adding no overhead for unprotected molecules.
- lacan.protect.protect_bonds_for_idx(mol, bond_indices)[source]
Mark specific bonds as protected.
Protected bonds are excluded from the LACAN score (see
score_mol_ignoring_protected_bonds()). They do not block mutations — useprotect_smartsin the operation functions for that.- Parameters:
mol (RDKit Mol)
bond_indices (iterable of bond indices to protect)
Mol. (Returns a new RDKit)
- lacan.protect.protect_rejected_bonds(mol, profile=None, t=0.05)[source]
Protect all bonds that currently fail the LACAN score threshold.
After calling this,
score_mol_ignoring_protected_bonds()will score the molecule as if those bonds do not exist. This is useful when a molecule contains a structural motif that is required (e.g. a reactive warhead) but would otherwise cause the whole molecule to score 0.- Parameters:
mol (RDKit Mol)
profile (LACAN profile dict; loads the default ChEMBL profile if None)
t (bond score threshold (default 0.05))
protected. (Returns a new RDKit Mol with failing bonds)
- lacan.protect.get_protected_bond_indices(mol)[source]
Return a list of indices of all protected bonds in mol.
- lacan.protect.bond_is_protected(bond)[source]
Return True if the RDKit Bond object has the
_lpprotection mark.
- lacan.protect.score_mol_ignoring_protected_bonds(mol, profile=None, mode='score', t=0.05)[source]
Score a molecule while ignoring any bonds marked as protected.
This is a drop-in replacement for
lacan.lacan.score_mol()that omits protected bonds from both the minimum-score calculation and thebad_bondslist. If all bonds are protected the function returns 1.0 (trivially passes).- Parameters:
mol (RDKit Mol (may have protected bonds))
profile (LACAN profile dict; loads ChEMBL default if None)
mode (
"score"(continuous, 0–1) or"threshold"(0 or 1))t (bond score threshold (default 0.05))
- Return type:
(score, info) where
info["bad_bonds"]lists unprotected failing bond indices.
- lacan.protect.mol_cleaner(mol, profile=None, score_threshold=0.5, t=0.05, max_iter=100, lateral_patience=5)[source]
Iteratively mutate a molecule to eliminate all LACAN bond violations.
This function is designed for molecules that mostly pass the LACAN profile but have one or more bad bond environments that need to be fixed. It freezes the parts that already pass (using bond protection) and mutates only the failing regions.
Strategy
Each iteration the cleaner:
Generates all single-step mutation products via
_raw_mutations()— crucially without score-filtering the products. Score-filtering at this stage would reject all partially-fixed intermediates (those that still have some bad bonds), preventing multi-step repair paths.Evaluates each product by counting its unprotected LACAN violations. The parent molecule’s protected bond mask is re-derived from the product’s own bond scores (via
_reprotect()), so each candidate is assessed on its own bond landscape.Picks the best candidate:
Improvement step — a candidate with fewer violations than the current molecule. Resets the lateral counter.
Lateral move — if no improvement is available and the lateral patience budget allows, accept the highest-scoring candidate with the same violation count.
Re-protects the accepted candidate’s newly-passing bonds so subsequent mutations stay focused on the remaining bad regions.
- param mol:
- type mol:
RDKit Mol to clean
- param profile:
- type profile:
LACAN profile dict (loads ChEMBL default if None)
- param score_threshold:
(default 0.5)
- type score_threshold:
final LACAN score required to consider the molecule clean
- param t:
- type t:
bond score threshold used throughout (default 0.05)
- param max_iter:
- type max_iter:
hard cap on total iterations (default 100)
- param lateral_patience:
(default 5)
- type lateral_patience:
consecutive lateral steps allowed before giving up
- rtype:
RDKit Mol if a clean version is found, else None.