API Reference

This section documents the public API of EXCAVATE-HT.

This script runs EXCAVATE-HT, a pipeline that generates guide RNA libraries for a given locus, with optional pairing, and off-target detection.

excavate.main.add_common_args(parser)[source]
excavate.main.add_generate_parser(subparsers)[source]

Add the generate subcommand parser

excavate.main.add_pair_parser(subparsers)[source]

Add the pair subcommand parser

excavate.main.apply_pairing(all_guides, method, fixed_points=None)[source]
excavate.main.initialize_cas_obj(args)[source]
excavate.main.load_variant_data(vcf_file, var_type, locus, af_threshold)[source]
excavate.main.main()[source]

Main entry point for the gRNA pipeline

excavate.main.run_generate(args)[source]
excavate.main.run_off_targets(args, all_guides_unique, genome_fa, cas_obj, outdir, chrom_name, chrom_fasta_path)[source]

Run Bowtie1-based off-target analysis.

Parameters:
  • args (argparse.Namespace) – Must include: off_targets, genome_index_prefix, chrom_index_prefix, bowtie_threads Optional: no_index_build (if you add it)

  • all_guides_unique (pd.DataFrame (or your guide table type))

  • genome_fa (str | Path) – Path to genome FASTA (used for index auto-build if enabled)

  • cas_obj (object) – Your CasParameters/whatever ot.add_bowtie_offtargets expects

  • outdir (str | Path) – Output directory for caching indexes/tmp files

  • chrom_name (str) – e.g. “chr1”

  • chrom_fasta_path (str | Path) – Path to chromosome FASTA (used for index auto-build if enabled)

excavate.main.run_pairing(args)[source]
class excavate.ap.Cas(name, pam_three_prime, pam_five_prime, pam_length, exclusion_positions, is_five_prime, is_three_prime, is_multi_pam)[source]

Bases: object

excavate.ap.SaCas9 = <excavate.ap.Cas object>

function to output a regex pattern from IUPAC PAM code. :param custom_pam_list: list of PAM sequences in IUPAC code

Returns:

list of PAM sequences in regex format

Return type:

custom_pam_list_regex

excavate.ap.all_guides_var_info(gensdict, guidesdf)[source]
excavate.ap.cas_obj = None
Parameters:
  • genome_path – path to the genome fasta file

  • cas_params – Cas object

Returns:

None

excavate.ap.count_exact_matches(df, genome_fasta_path, cas_parameters, num_processes=None)[source]

Counts exact matches in the genome for each guide in a guidesdf. Uses multi-processing for supposedly faster execution. :param df: dataframe of guides :param genome_fasta_path: path to the genome fasta file :param cas_parameters: Cas object :param num_processes: number of processes to use

Returns:

dataframe of guides with exact matches in the genome annotated

Return type:

df

excavate.ap.create_custom_cas_obj(custom_pam, orient)[source]
excavate.ap.create_exclusion_pos_set(custom_pam_list, orient)[source]
excavate.ap.create_gens(vcf_file, locus, variants_db, af_threshold=0.1)[source]
excavate.ap.create_regex_pam(custom_pam_list)[source]
excavate.ap.find_guides(snplist, sequence, cas_obj, max_snp_pos_in_protospacer, guide_len)[source]
excavate.ap.fixed_point_pair(guidesdf, points_list)[source]
excavate.ap.getaltseq(gens_df, refseq, snpform='allele1')[source]
excavate.ap.init_worker(genome_path, cas_params)[source]
excavate.ap.makeseq(fa_filename)[source]
excavate.ap.one_mismatch(df, chseq)[source]
excavate.ap.output_bed_format(guidesdf)[source]
excavate.ap.random_pair(guidesdf)[source]
excavate.ap.search_pattern(guide)[source]
excavate.ap.split_phased(clean_guidesdf, phased_vcf, locus)[source]
excavate.ap.targetable_vars(guidesdf, snplist)[source]
excavate.ap.tiling_pair(guidesdf)[source]