lacan.protect

protect.py — Bond protection and SMARTS-based atom exclusion for LACAN.

This module provides two mechanisms:

SMARTS-based atom exclusion (stateless)

Pass a protect_smarts SMARTS string to any operation in lacan.replace, lacan.mutate, or optimize_from_mol(). Atoms matching the pattern are skipped by all mutation and fragment-replacement operations each call. Because the protected set is re-derived from the molecule on every call via get_protected_atoms(), it survives SMILES round-trips transparently, no atom properties are stored on the molecule.

Bond protection (stateful, stored as _lp bond property)

Protected bonds are excluded from the LACAN score computation. Use this when a molecule contains a required but chemically unusual motif (e.g. a Michael-acceptor covalent warhead) whose LACAN score would unfairly penalise the whole molecule. Bond protection is stored as the _lp boolean property on RDKit bond objects and is carried through all replacement operations via the internal _restore_bond_protection helper in lacan.replace.

All functions return new molecules — the originals are never modified in place.

The mol_cleaner() utility uses bond protection internally to iteratively repair LACAN-failing molecules while preserving the parts that already pass.

lacan.protect.PROP = '_lp'

Name of the RDKit bond boolean property used to mark bond protection.

Note: atom protection is no longer stored as a mol property — use protect_smarts parameters instead (see get_protected_atoms()).

lacan.protect.get_protected_atoms(mol, protect_smarts)[source]

Return the set of atom indices matched by protect_smarts in mol.

Call it once at the top of any function that needs to skip protected atoms, then pass the resulting frozenset to the per-atom checks.

Parameters:
  • mol (RDKit Mol)

  • protect_smarts (str or None. If None, returns an empty frozenset.)

Returns:

Atom indices matched by the SMARTS. Empty frozenset if protect_smarts is None or "".

Return type:

frozenset of int

Raises:

ValueError – If protect_smarts is not None and cannot be parsed.

lacan.protect.reaction_touches_protected(mol, rxn, protect_smarts)[source]

Return True if any match of rxn overlaps an atom matched by protect_smarts.

Called by apply_mutations() before each reaction to skip operations that would modify a protected atom. Returns False immediately when protect_smarts is None or "", adding no overhead for unprotected molecules.

Parameters:
  • mol (RDKit Mol)

  • rxn (RDKit ChemicalReaction)

  • protect_smarts (str or None — SMARTS identifying atoms to exclude;) – None disables the check entirely.

Returns:

True if the reaction should be skipped; False if safe.

Return type:

bool

lacan.protect.protect_bonds_for_idx(mol, bond_indices)[source]

Mark specific bonds as protected.

Protected bonds are excluded from the LACAN score (see score_mol_ignoring_protected_bonds()). They do not block mutations — use protect_smarts in the operation functions for that.

Parameters:
  • mol (RDKit Mol)

  • bond_indices (iterable of bond indices to protect)

  • Mol. (Returns a new RDKit)

lacan.protect.protect_rejected_bonds(mol, profile=None, t=0.05)[source]

Protect all bonds that currently fail the LACAN score threshold.

After calling this, score_mol_ignoring_protected_bonds() will score the molecule as if those bonds do not exist. This is useful when a molecule contains a structural motif that is required (e.g. a reactive warhead) but would otherwise cause the whole molecule to score 0.

Parameters:
  • mol (RDKit Mol)

  • profile (LACAN profile dict; loads the default ChEMBL profile if None)

  • t (bond score threshold (default 0.05))

  • protected. (Returns a new RDKit Mol with failing bonds)

lacan.protect.get_protected_bond_indices(mol)[source]

Return a list of indices of all protected bonds in mol.

lacan.protect.bond_is_protected(bond)[source]

Return True if the RDKit Bond object has the _lp protection mark.

lacan.protect.score_mol_ignoring_protected_bonds(mol, profile=None, mode='score', t=0.05)[source]

Score a molecule while ignoring any bonds marked as protected.

This is a drop-in replacement for lacan.lacan.score_mol() that omits protected bonds from both the minimum-score calculation and the bad_bonds list. If all bonds are protected the function returns 1.0 (trivially passes).

Parameters:
  • mol (RDKit Mol (may have protected bonds))

  • profile (LACAN profile dict; loads ChEMBL default if None)

  • mode ("score" (continuous, 0–1) or "threshold" (0 or 1))

  • t (bond score threshold (default 0.05))

Return type:

(score, info) where info["bad_bonds"] lists unprotected failing bond indices.

lacan.protect.mol_cleaner(mol, profile=None, score_threshold=0.5, t=0.05, max_iter=100, lateral_patience=5)[source]

Iteratively mutate a molecule to eliminate all LACAN bond violations.

This function is designed for molecules that mostly pass the LACAN profile but have one or more bad bond environments that need to be fixed. It freezes the parts that already pass (using bond protection) and mutates only the failing regions.

Strategy

Each iteration the cleaner:

  1. Generates all single-step mutation products via _raw_mutations() — crucially without score-filtering the products. Score-filtering at this stage would reject all partially-fixed intermediates (those that still have some bad bonds), preventing multi-step repair paths.

  2. Evaluates each product by counting its unprotected LACAN violations. The parent molecule’s protected bond mask is re-derived from the product’s own bond scores (via _reprotect()), so each candidate is assessed on its own bond landscape.

  3. Picks the best candidate:

    • Improvement step — a candidate with fewer violations than the current molecule. Resets the lateral counter.

    • Lateral move — if no improvement is available and the lateral patience budget allows, accept the highest-scoring candidate with the same violation count.

  4. Re-protects the accepted candidate’s newly-passing bonds so subsequent mutations stay focused on the remaining bad regions.

param mol:

type mol:

RDKit Mol to clean

param profile:

type profile:

LACAN profile dict (loads ChEMBL default if None)

param score_threshold:

(default 0.5)

type score_threshold:

final LACAN score required to consider the molecule clean

param t:

type t:

bond score threshold used throughout (default 0.05)

param max_iter:

type max_iter:

hard cap on total iterations (default 100)

param lateral_patience:

(default 5)

type lateral_patience:

consecutive lateral steps allowed before giving up

rtype:

RDKit Mol if a clean version is found, else None.