ChemBFN: BFN for Chemistry Tasks

Constant

bayesianflow_for_chem.data.VOCAB_KEYS

    Default SMILES and SAFE vocabulary keys.

bayesianflow_for_chem.data.VOCAB_COUNT

    Default number of vocabularies of SMILES and SAFE.

bayesianflow_for_chem.data.FASTA_VOCAB_KEYS

    Default FASTA sequence vocabulary keys.

bayesianflow_for_chem.data.FASTA_VOCAB_COUNT

    Default number of vocabularues of FASTA.

bayesianflow_for_chem.train.DEFAULT_MODEL_HPARAM

    Default hyperparameters for training a generative model.

bayesianflow_for_chem.train.DEFAULT_REGRESSOR_HPARAM

    Default hyperparameters for training a regression or classification model.

Tokeniser

bayesianflow_for_chem.data.load_vocab(vocab_file) → dict

    Load vocabulary from source file.

bayesianflow_for_chem.data.smiles2token(smiles) → Tensor

    Tokenise a SMILES and SAFE string.

bayesianflow_for_chem.data.fasta2token(fasta) → Tensor

    Tokenise a FASTA-style sequence.

bayesianflow_for_chem.data.split_selfies(selfies) → list

    Split a SELFIES string into individual elements.

Dataset

class bayesianflow_for_chem.data.CSVData(file)

    Define dataset stored in CSV file.

    map(mapping) → None

       Pass a customised mapping function to transform the data entities to tensors.


bayesianflow_for_chem.data.collate(batch) → list

    Padding the data in one batch into the same size.

Model

class bayesianflow_for_chem.ChemBFN(num_vocab, channel=512, num_layer=12, num_head=8, dropout=0.01)

    Bayesian Flow Network for Chemistry model representation.

    enable_lora(r=4, lora_alpha=1, lora_dropout=0.0) → None

       Enable LoRA parameters.

    reconstruction_loss(x, t, y) → Tensor

       Compute reconstruction loss.

    inference(x, mlp, embed_fn=None) → Tensor

       Predict activity or property from molecular tokens.

    classmethod from_checkpoint(ckpt, ckpt_lora=None) → Self

       Load model weight from a checkpoint.


class bayesianflow_for_chem.MLP(size, class_input=False, dropout=0.0)

    forward(x) → Tensor

       Do the forward pass.

    classmethod from_checkpoint(ckpt, strict=True) → Self

       Load model weight from a checkpoint.


class bayesianflow_for_chem.EnsembleChemBFN(base_model_path, lora_paths, cond_heads, adapter_weights, semi_autoregressive_flags)

    Ensemble of ChemBFN models from LoRA checkpoints.

    quantise(quantise_method=None) → None

       Quantise the submodels.

    jit(freeze=False) → None

       JIT compile the submodels.

Scorer

bayesianflow_for_chem.scorer.smiles_valid(smiles) → int

    Return the validity of a SMILES string.

bayesianflow_for_chem.scorer.qed_score(smiles) → float

    Return the quantitative estimate of drug-likeness score of a SMILES string.

bayesianflow_for_chem.scorer.sa_score(smiles) → float

    Return the synthetic accessibility score of a SMILES string.


class bayesianflow_for_chem.scorer.Scorer(scorers, score_criteria, vocab_keys, vocab_separator=”“, valid_checker=None, eta=0.001, name=”scorer”)

    Scorer class that defines the scorer behaviour in the online RL.

    calc_score_loss(p) → Tensor

       Calculate the score loss.

Spectrum

bayesianflow_for_chem.spectra.build_uv_vis_spectrum(etoscs, etenergies, lambdas) → NDArray

    Build UV/Vis spectrum from calculated electron transtion energies and oscillator strengths.

bayesianflow_for_chem.spectra.spectra_wasserstein_score(spectrum_u, spectrum_v, x_axis) → NDArray

    Return the scaled Wasserstein distance between two continuous spectra.

Tool

bayesianflow_for_chem.tool.test(model, mlp, data, mode, device=None) → dict

    Test the trained regression or classification model.

bayesianflow_for_chem.tool.split_dataset(file, split_ratio=[8, 1, 1], method=”random”) → None

    Split a dataset stored in CSV file based on random split or scaffold split.

bayesianflow_for_chem.tool.smaple(model, batch_size, sequence_size, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list

    Generate molecules.

bayesianflow_for_chem.tool.inpaint(model, x, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list

    Inpaint masked molecules.

bayesianflow_for_chem.tool.optimise(model, x, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list

    Optimise template molecules.

bayesianflow_for_chem.tool.quantise_model_(model) → None

    In-place dynamic quantise the trained model.

bayesianflow_for_chem.tool.adjust_lora_(model, lora_scale=0.1) → None

    In-place adjust LoRA scaling parameter.

bayesianflow_for_chem.tool.merge_lora_(model) → None

    In-place merge LoRA parameters into base model.


class bayesianflow_for_chem.tool.GeometryConverter

    smiles2certesian(smiles, num_conformers, rdkit_ff_type=”MMFF”, refine_with_crest=False, spin=0.0) → tuple

       Guess the 3D gemoetry of the SMILES via conformer search.

    cartesian2smiles(symbols, coordinates, charge=0, canonical=True) → str

       Transform molecular geometry to SMILES string.

LightningModule Wrapper

class bayesianflow_for_chem.train.Model(model, mlp=None, scorer=None, hparam=DEFAULT_MODEL_HPARAM)

    A ~lightning.LightningModule wrapper of ChemBFB generative model used for training.

    export_model(workdir) → None

       Save the trained model.


class bayesianflow_for_chem.train.Regressor(model, mlp, hparam=DEFAULT_REGRESSOR_HPARAM)

    A ~lightning.LightningModule wrapper of ChemBFN regression or classification model for training.

    export_model(workdir) → None

       Save the trained model.