bayesianflow_for_chem.data.VOCAB_KEYS
Default SMILES and SAFE vocabulary keys.
bayesianflow_for_chem.data.VOCAB_COUNT
Default number of vocabularies of SMILES and SAFE.
bayesianflow_for_chem.data.FASTA_VOCAB_KEYS
Default FASTA sequence vocabulary keys.
bayesianflow_for_chem.data.FASTA_VOCAB_COUNT
Default number of vocabularues of FASTA.
bayesianflow_for_chem.train.DEFAULT_MODEL_HPARAM
Default hyperparameters for training a generative model.
bayesianflow_for_chem.train.DEFAULT_REGRESSOR_HPARAM
Default hyperparameters for training a regression or classification model.
bayesianflow_for_chem.data.load_vocab(vocab_file) → dict
Load vocabulary from source file.
bayesianflow_for_chem.data.smiles2token(smiles) → Tensor
Tokenise a SMILES and SAFE string.
bayesianflow_for_chem.data.fasta2token(fasta) → Tensor
Tokenise a FASTA-style sequence.
bayesianflow_for_chem.data.split_selfies(selfies) → list
Split a SELFIES string into individual elements.
class bayesianflow_for_chem.data.CSVData(file)
Define dataset stored in CSV file.
map(mapping) → None
Pass a customised mapping function to transform the data entities to tensors.
bayesianflow_for_chem.data.collate(batch) → list
Padding the data in one batch into the same size.
class bayesianflow_for_chem.ChemBFN(num_vocab, channel=512, num_layer=12, num_head=8, dropout=0.01)
Bayesian Flow Network for Chemistry model representation.
enable_lora(r=4, lora_alpha=1, lora_dropout=0.0) → None
Enable LoRA parameters.
reconstruction_loss(x, t, y) → Tensor
Compute reconstruction loss.
inference(x, mlp, embed_fn=None) → Tensor
Predict activity or property from molecular tokens.
classmethod from_checkpoint(ckpt, ckpt_lora=None) → Self
Load model weight from a checkpoint.
class bayesianflow_for_chem.MLP(size, class_input=False, dropout=0.0)
forward(x) → Tensor
Do the forward pass.
classmethod from_checkpoint(ckpt, strict=True) → Self
Load model weight from a checkpoint.
class bayesianflow_for_chem.EnsembleChemBFN(base_model_path, lora_paths, cond_heads, adapter_weights, semi_autoregressive_flags)
Ensemble of ChemBFN models from LoRA checkpoints.
quantise(quantise_method=None) → None
Quantise the submodels.
jit(freeze=False) → None
JIT compile the submodels.
bayesianflow_for_chem.scorer.smiles_valid(smiles) → int
Return the validity of a SMILES string.
bayesianflow_for_chem.scorer.qed_score(smiles) → float
Return the quantitative estimate of drug-likeness score of a SMILES string.
bayesianflow_for_chem.scorer.sa_score(smiles) → float
Return the synthetic accessibility score of a SMILES string.
class bayesianflow_for_chem.scorer.Scorer(scorers, score_criteria, vocab_keys, vocab_separator=”“, valid_checker=None, eta=0.001, name=”scorer”)
Scorer class that defines the scorer behaviour in the online RL.
calc_score_loss(p) → Tensor
Calculate the score loss.
bayesianflow_for_chem.spectra.build_uv_vis_spectrum(etoscs, etenergies, lambdas) → NDArray
Build UV/Vis spectrum from calculated electron transtion energies and oscillator strengths.
bayesianflow_for_chem.spectra.spectra_wasserstein_score(spectrum_u, spectrum_v, x_axis) → NDArray
Return the scaled Wasserstein distance between two continuous spectra.
bayesianflow_for_chem.tool.test(model, mlp, data, mode, device=None) → dict
Test the trained regression or classification model.
bayesianflow_for_chem.tool.split_dataset(file, split_ratio=[8, 1, 1], method=”random”) → None
Split a dataset stored in CSV file based on random split or scaffold split.
bayesianflow_for_chem.tool.smaple(model, batch_size, sequence_size, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list
Generate molecules.
bayesianflow_for_chem.tool.inpaint(model, x, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list
Inpaint masked molecules.
bayesianflow_for_chem.tool.optimise(model, x, sample_step=100, y=None, guidance_strength=4.0, device=None, vocab_keys=VOCAB_KEYS, separator=”“, method=”BFN”, allowed_tokens=”all”, sort=False) → list
Optimise template molecules.
bayesianflow_for_chem.tool.quantise_model_(model) → None
In-place dynamic quantise the trained model.
bayesianflow_for_chem.tool.adjust_lora_(model, lora_scale=0.1) → None
In-place adjust LoRA scaling parameter.
bayesianflow_for_chem.tool.merge_lora_(model) → None
In-place merge LoRA parameters into base model.
class bayesianflow_for_chem.tool.GeometryConverter
smiles2certesian(smiles, num_conformers, rdkit_ff_type=”MMFF”, refine_with_crest=False, spin=0.0) → tuple
Guess the 3D gemoetry of the SMILES via conformer search.
cartesian2smiles(symbols, coordinates, charge=0, canonical=True) → str
Transform molecular geometry to SMILES string.
class bayesianflow_for_chem.train.Model(model, mlp=None, scorer=None, hparam=DEFAULT_MODEL_HPARAM)
A ~lightning.LightningModule
wrapper of ChemBFB generative model used for training.
export_model(workdir) → None
Save the trained model.
class bayesianflow_for_chem.train.Regressor(model, mlp, hparam=DEFAULT_REGRESSOR_HPARAM)
A ~lightning.LightningModule
wrapper of ChemBFN regression or classification model for training.
export_model(workdir) → None
Save the trained model.