ChemBFN: BFN for Chemistry Tasks

Constant

bayesianflow_for_chem.data.VOCAB_KEYS

Default SMILES and SAFE vocabulary keys.

bayesianflow_for_chem.data.VOCAB_COUNT

Default number of vocabularies of SMILES and SAFE.

bayesianflow_for_chem.data.FASTA_VOCAB_KEYS

Default FASTA sequence vocabulary keys.

bayesianflow_for_chem.data.FASTA_VOCAB_COUNT

Default number of vocabularues of FASTA.

bayesianflow_for_chem.train.DEFAULT_MODEL_HPARAM

Default hyperparameters for training a generative model.

bayesianflow_for_chem.train.DEFAULT_REGRESSOR_HPARAM

Default hyperparameters for training a regression or classification model.

Tokeniser

bayesianflow_for_chem.data.load_vocab(vocab_file) → dict

Load vocabulary from source file.

bayesianflow_for_chem.data.smiles2token(smiles) → Tensor

Tokenise a SMILES and SAFE string.

bayesianflow_for_chem.data.fasta2token(fasta) → Tensor

Tokenise a FASTA-style sequence.

bayesianflow_for_chem.data.split_selfies(selfies) → list

Split a SELFIES string into individual elements.

Dataset

class bayesianflow_for_chem.data.CSVData(file)

Define dataset stored in CSV file.

map(mapping) → None

Pass a customised mapping function to transform the data entities to tensors.

bayesianflow_for_chem.data.collate(batch) → list

Padding the data in one batch into the same size.

Model

class bayesianflow_for_chem.ChemBFN(num_vocab, channel=512, num_layer=12, num_head=8, dropout=0.01)

Bayesian Flow Network for Chemistry model representation.

enable_lora(r=4, lora_alpha=1, lora_dropout=0.0) → None

Enable LoRA parameters.

reconstruction_loss(x, t, y) → Tensor

Compute reconstruction loss.

inference(x, mlp, embed_fn=None) → Tensor

Predict activity or property from molecular tokens.

classmethod from_checkpoint(ckpt, ckpt_lora=None) → Self

Load model weight from a checkpoint.

class bayesianflow_for_chem.MLP(size, class_input=False, dropout=0.0)

forward(x) → Tensor

Do the forward pass.

classmethod from_checkpoint(ckpt, strict=True) → Self

Load model weight from a checkpoint.

class bayesianflow_for_chem.EnsembleChemBFN(base_model_path, lora_paths, cond_heads, adapter_weights, semi_autoregressive_flags)

Ensemble of ChemBFN models from LoRA checkpoints.

quantise(quantise_method=None) → None

Quantise the submodels.

jit(freeze=False) → None

JIT compile the submodels.

Scorer

bayesianflow_for_chem.scorer.smiles_valid(smiles) → int

Return the validity of a SMILES string.

bayesianflow_for_chem.scorer.qed_score(smiles) → float

Return the quantitative estimate of drug-likeness score of a SMILES string.

bayesianflow_for_chem.scorer.sa_score(smiles) → float

Return the synthetic accessibility score of a SMILES string.

class bayesianflow_for_chem.scorer.Scorer(scorers, score_criteria, vocab_keys, vocab_separator=”“, valid_checker=None, eta=0.001, name=”scorer”)

Scorer class that defines the scorer behaviour in the online RL.

calc_score_loss(p) → Tensor

Calculate the score loss.

Spectrum

bayesianflow_for_chem.spectra.build_uv_vis_spectrum(etoscs, etenergies, lambdas) → NDArray

Build UV/Vis spectrum from calculated electron transtion energies and oscillator strengths.

bayesianflow_for_chem.spectra.spectra_wasserstein_score(spectrum_u, spectrum_v, x_axis) → NDArray

Return the scaled Wasserstein distance between two continuous spectra.

Tool

bayesianflow_for_chem.tool.test(model, mlp, data, mode, device=None) → dict

Test the trained regression or classification model.

bayesianflow_for_chem.tool.split_dataset(file, split_ratio=[8, 1, 1], method=”random”) → None

Split a dataset stored in CSV file based on random split or scaffold split.

bayesianflow_for_chem.tool.smaple(model, batch_size, sequence_size, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list

Generate molecules.

bayesianflow_for_chem.tool.inpaint(model, x, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list

Inpaint masked molecules.

bayesianflow_for_chem.tool.optimise(model, x, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list

Optimise template molecules.

bayesianflow_for_chem.tool.quantise_model_(model) → None

In-place dynamic quantise the trained model.

bayesianflow_for_chem.tool.adjust_lora_(model, lora_scale=0.1) → None

In-place adjust LoRA scaling parameter.

bayesianflow_for_chem.tool.merge_lora_(model) → None

In-place merge LoRA parameters into base model.

class bayesianflow_for_chem.tool.GeometryConverter

smiles2certesian(smiles, num_conformers, rdkit_ff_type=”MMFF”, refine_with_crest=False, spin=0.0) → tuple

Guess the 3D gemoetry of the SMILES via conformer search.

cartesian2smiles(symbols, coordinates, charge=0, canonical=True) → str

Transform molecular geometry to SMILES string.

LightningModule Wrapper

class bayesianflow_for_chem.train.Model(model, mlp=None, scorer=None, hparam=DEFAULT_MODEL_HPARAM)

A ~lightning.LightningModule wrapper of ChemBFB generative model used for training.

export_model(workdir) → None

Save the trained model.

class bayesianflow_for_chem.train.Regressor(model, mlp, hparam=DEFAULT_REGRESSOR_HPARAM)

A ~lightning.LightningModule wrapper of ChemBFN regression or classification model for training.

export_model(workdir) → None

Save the trained model.

Discover more in source code.