lacan.breed

breed.py — Molecular crossover via fragment-based recombination.

This module implements the crossover operation used by the adaptive GA in gen.py. Two molecules are cut at their non-ring bonds and their fragments are recombined: substituents from one molecule are attached to the core of the other, producing offspring that inherit structural features from both parents.

Algorithm

  1. Fragmentation (fragment_molecule()) — cut the molecule at n non-ring bonds in all combinations. Keep only cuts that yield exactly n single-dummy substituents and one n-dummy core.

  2. Recombination (crossover_fragments()) — randomly pair a core from one molecule with substituents from the other, fill all attachment points, and accept offspring that:

    • pass the LACAN score threshold (score_mol > 0.5; uses the current min_PMI / (1 + min_PMI) formula),

    • have heavy-atom count in [hacmin, hacmax],

    • have a “balance ratio” (fraction of atoms from parent 2) in [min_ratio, max_ratio] — this ensures the offspring genuinely interpolates between the two parents rather than being almost identical to one of them.

  3. Deduplication by InChIKey — no duplicate offspring are returned.

Fallback

If three-cut fragmentation yields no substituents for a molecule (e.g. because it has fewer than three non-ring bonds), breed() falls back to two-cut fragmentation. This behaviour is silent by default; set debug=True to print a message when the fallback fires.

lacan.breed.combine_frags = <rdkit.Chem.rdChemReactions.ChemicalReaction object>

Join two dummy attachment points (any bond type).

lacan.breed.fragment_molecule(mol, n=3)[source]

Cut a molecule at n non-ring bonds and return substituents and cores.

All combinations of n non-ring bond indices are tried. A cut is retained only if it produces exactly n single-dummy fragments (substituents) and one n-dummy fragment (the core).

Parameters:
  • mol (RDKit Mol)

  • n (int — number of cuts to make (default 3))

Returns:

(substituents, cores) – Canonical SMILES strings with * dummies. Either list may be empty if no valid cuts exist at the requested depth.

Return type:

(list of str, list of str)

lacan.breed.crossover_fragments(s1, s2, c1, c2, profile, nmols=10, randomseed=123, max_steps=1000, hacmin=0, hacmax=30, min_ratio=0.25, max_ratio=0.75)[source]

Recombine substituents and cores from two molecules to produce offspring.

At each step a coin-flip chooses which parent provides the core and which provides the substituents. All attachment points of the core are filled with randomly sampled substituents, and the offspring is accepted if it passes the LACAN filter and size/ratio constraints.

The ratio is atoms_from_parent2 / total_atoms; restricting it to [min_ratio, max_ratio] ensures the offspring genuinely blends both parents rather than being almost identical to one.

Parameters:
  • s1 (list of str — substituent SMILES from parents 1 and 2)

  • s2 (list of str — substituent SMILES from parents 1 and 2)

  • c1 (list of str — core SMILES from parents 1 and 2)

  • c2 (list of str — core SMILES from parents 1 and 2)

  • profile (LACAN profile dict)

  • nmols (int — target number of offspring to produce (default 10))

  • randomseed (int — random seed for reproducibility (default 123))

  • max_steps (int — maximum sampling attempts before giving up (default 1000))

  • hacmin (int — minimum heavy-atom count (default 0))

  • hacmax (int — maximum heavy-atom count (default 30))

  • min_ratio (float — minimum parent-2 atom fraction (default 0.25))

  • max_ratio (float — maximum parent-2 atom fraction (default 0.75))

Returns:

  • list of RDKit Mol — deduplicated offspring (may be fewer than *nmols if*

  • max_steps is reached first)

lacan.breed.breed(m1, m2, profile, nmols=10, cuts=3, hacrange=(0.8, 1.2), interprange=(0.3, 0.7), debug=False)[source]

Cross two molecules and return a list of offspring molecules.

Fragments both parents at cuts non-ring bonds, then calls crossover_fragments() to combine their pieces. The heavy-atom count range for offspring is derived from the parents’ sizes via hacrange: [hacrange[0] * min(n1,n2), hacrange[1] * max(n1,n2)].

If three-cut fragmentation fails for either parent (too few non-ring bonds), the function silently falls back to two-cut fragmentation. Set debug=True to see a message when this happens.

Parameters:
  • m1 (RDKit Mol — parent molecules)

  • m2 (RDKit Mol — parent molecules)

  • profile (LACAN profile dict)

  • nmols (int — number of offspring to request (default 10))

  • cuts (int — number of cuts to make in each parent (default 3))

  • hacrange ((float, float) (min_frac, max_frac) multiplied by parent) – sizes to set the offspring heavy-atom count window

  • interprange ((float, float) (min_ratio, max_ratio) fraction of atoms) – from parent 2 (controls interpolation balance)

  • debug (bool — if True, print a message when falling back to 2-cut) – fragmentation (default False)

Return type:

list of RDKit Mol

lacan.breed.cross_breed_mols(mols, p, score_threshold, nmols=1, n_jobs=-1, debug=False)[source]

Apply breed() across an entire population in parallel.

Each molecule in mols is paired with a randomly chosen partner from the same list, and breed() is called for each pair via a multiprocessing.Pool. Workers always run with debug=False regardless of the caller’s debug flag, because worker processes cannot print to a Jupyter notebook.

Parameters:
  • mols (list of RDKit Mol — population to breed)

  • p (LACAN profile dict)

  • score_threshold (passed through to breed() (currently unused there,) – kept for API consistency with other operations)

  • nmols (int — offspring requested per pair (default 1))

  • n_jobs (int — parallel workers; -1 uses all CPU cores)

  • debug (bool — not forwarded to workers (see note above))

Return type:

list of RDKit Mol — flat list of all offspring from all pairs