API Reference

This section documents the public API of EXCAVATE-HT.

This script runs EXCAVATE-HT, a pipeline that generates guide RNA libraries for a given locus, with optional pairing, and off-target detection.

excavate.main.add_common_args(parser)[source]

excavate.main.add_generate_parser(subparsers)[source]: Add the generate subcommand parser

excavate.main.add_pair_parser(subparsers)[source]: Add the pair subcommand parser

excavate.main.apply_pairing(all_guides, method, fixed_points=None)[source]

excavate.main.initialize_cas_obj(args)[source]

excavate.main.load_variant_data(vcf_file, var_type, locus, af_threshold)[source]

excavate.main.main()[source]: Main entry point for the gRNA pipeline

excavate.main.run_generate(args)[source]

excavate.main.run_off_targets(args, all_guides_unique, genome_fa, cas_obj, outdir, chrom_name, chrom_fasta_path)[source]

Run Bowtie1-based off-target analysis.

Parameters:

args (argparse.Namespace) – Must include: off_targets, genome_index_prefix, chrom_index_prefix, bowtie_threads Optional: no_index_build (if you add it)
all_guides_unique (pd.DataFrame (or your guide table type))
genome_fa (str | Path) – Path to genome FASTA (used for index auto-build if enabled)
cas_obj (object) – Your CasParameters/whatever ot.add_bowtie_offtargets expects
outdir (str | Path) – Output directory for caching indexes/tmp files
chrom_name (str) – e.g. “chr1”
chrom_fasta_path (str | Path) – Path to chromosome FASTA (used for index auto-build if enabled)

excavate.main.run_pairing(args)[source]

class excavate.ap.Cas(name, pam_three_prime, pam_five_prime, pam_length, exclusion_positions, is_five_prime, is_three_prime, is_multi_pam)[source]: Bases: object

excavate.ap.SaCas9 = <excavate.ap.Cas object>

function to output a regex pattern from IUPAC PAM code. :param custom_pam_list: list of PAM sequences in IUPAC code

Returns:: list of PAM sequences in regex format
Return type:: custom_pam_list_regex

excavate.ap.all_guides_var_info(gensdict, guidesdf)[source]

excavate.ap.cas_obj = None

Parameters:

genome_path – path to the genome fasta file
cas_params – Cas object

Returns:

None

excavate.ap.count_exact_matches(df, genome_fasta_path, cas_parameters, num_processes=None)[source]

Counts exact matches in the genome for each guide in a guidesdf. Uses multi-processing for supposedly faster execution. :param df: dataframe of guides :param genome_fasta_path: path to the genome fasta file :param cas_parameters: Cas object :param num_processes: number of processes to use

Returns:: dataframe of guides with exact matches in the genome annotated
Return type:: df

excavate.ap.create_custom_cas_obj(custom_pam, orient)[source]

excavate.ap.create_exclusion_pos_set(custom_pam_list, orient)[source]

excavate.ap.create_gens(vcf_file, locus, variants_db, af_threshold=0.1)[source]

excavate.ap.create_regex_pam(custom_pam_list)[source]

excavate.ap.find_guides(snplist, sequence, cas_obj, max_snp_pos_in_protospacer, guide_len)[source]

excavate.ap.fixed_point_pair(guidesdf, points_list)[source]

excavate.ap.getaltseq(gens_df, refseq, snpform='allele1')[source]

excavate.ap.init_worker(genome_path, cas_params)[source]

excavate.ap.makeseq(fa_filename)[source]

excavate.ap.one_mismatch(df, chseq)[source]

excavate.ap.output_bed_format(guidesdf)[source]

excavate.ap.random_pair(guidesdf)[source]

excavate.ap.search_pattern(guide)[source]

excavate.ap.split_phased(clean_guidesdf, phased_vcf, locus)[source]

excavate.ap.targetable_vars(guidesdf, snplist)[source]

excavate.ap.tiling_pair(guidesdf)[source]