Overview ======== LACAN (Leveraging Adjacent Co-occurrence of Atomic Neighborhoods) is a statistical molecular filter and generative toolkit. It scores molecules by asking: *how likely is each bond, given the chemical environments on both sides of it?* A model trained on ~27 million bonds from ChEMBL assigns a pointwise mutual information (PMI) score to every bond environment pair. The molecule-level score is derived from the minimum per-bond PMI, so a single unusual bond is enough to flag a molecule. Scoring model ------------- For each bond the model computes two ECFP2-like atom environment hashes — one per endpoint — and looks them up in a pre-built profile. The PMI score is:: observed = pairs[(e1, e2)] / setsize expected = (idx[e1] / setsize / 2) * (idx[e2] / setsize / 2) PMI = observed / expected The molecule-level score:: score = min_PMI / (1 + min_PMI) saturates toward 1.0 as the worst-bond PMI grows large, and approaches 0 when the worst bond is near zero. Module overview --------------- .. list-table:: :widths: 20 80 :header-rows: 1 * - Module - Purpose * - :mod:`lacan.lacan` - Core scoring: ``score_mol``, ``assess_per_bond``, profile I/O * - :mod:`lacan.mutate` - Atom-level mutations (40+ reaction SMARTS) — the EXPLOIT step * - :mod:`lacan.replace` - Coarse fragment swaps (ring / substituent / linker) — the EXPLORE step * - :mod:`lacan.breed` - Molecular crossover via fragment recombination * - :mod:`lacan.gen` - Random generation, corpus biasing, adaptive GA * - :mod:`lacan.protect` - SMARTS-based atom exclusion, bond protection, ``mol_cleaner`` * - :mod:`lacan.decompose` - Molecule fragmentation; corpus building Quick start ----------- .. code-block:: python from rdkit import Chem from lacan import lacan, gen, mutate, replace profile = lacan.load_profile("chembl") mol = Chem.MolFromSmiles("CCCc1nn(C)c2c(=O)[nH]c(-c3ccccc3)nc12") score, info = lacan.score_mol(mol, profile) print(f"Score: {score:.3f} bad bonds: {info['bad_bonds']}") # Generate drug-like molecules mols = gen.generate_filtered_molecules(profile, n_molecules=10, n_jobs=-1) # Optimise toward a scoring function def my_score(mols): return [lacan.score_mol(m, profile)[0] for m in mols] winners = gen.generate_optimized_molecules(my_score, profile, startN=20, generations=5) Genetic algorithm ----------------- :func:`~lacan.gen.generate_optimized_molecules` runs an adaptive GA that balances exploration and exploitation each generation using two mechanisms: **Smooth explore fraction** A float ``explore_fraction`` (0–1) controls the budget split between exploration arms (ring/substituent/linker replacement, scaffold decoration, crossover, random injection) and exploitation arms (atom-level mutation from :mod:`lacan.mutate`). It shifts toward mutation when the population plateaus, and toward exploration on diversity collapse, decaying back to a user-set baseline otherwise. **Per-operation Thompson Sampling bandit** Each operation is treated as an independent arm with a Beta posterior over its hit rate. Budget is allocated proportionally to sampled weights each generation, so productive arms receive more budget while all arms remain explored. Statistics can optionally persist across runs. Results are collected in a :class:`~lacan.gen.HallOfFame` that retains the all-time best diverse molecules with a Tanimoto diversity gate. **Presets** — ``preset="ml"`` / ``"medium"`` / ``"docking"`` / ``"guacamol"`` provide sensible defaults for fast, medium, slow, and unlimited-budget scoring functions respectively. Individual parameters always override preset values. See :func:`~lacan.gen.generate_optimized_molecules` for the full parameter reference.