API

File Formats

A demo data is available for loading. Additional details can be found in the pipeline.

demo_data = SLKB.load_demo_data()

Params:

None.

Returns:

demo_data. A list of 3 items: sequence file, counts, fle, and score file.

Functions For Data Insertion to KB

generate_database

Creates a sqlite3 or mysql database, using SLKB schema.

SLKB.create_SLKB(engine = 'sqlite:///SLKB_sqlite3', db_type = 'sqlite3')

Params:

engine: sqlalchemy url object. (Default: sqlite:///SLKB_sqlite3)
db_type: Type of database to use schema for, currently available in mysql and sqlite3. (Default: sqlite3)

Returns:

None.

extract_SLKB_webapp

Extracts the SLKB webapp to the specified location.

SLKB.extract_SLKB_webapp(location = os.getcwd())

Params:

location: Location to extract SLKB files. (Default: Current working directory)

Returns:

None.

prepare_study_for_export

Prepares the counts, scores, and sequences files for insertion into the DB.

db_inserts = SLKB.prepare_study_for_export(score_ref, sequence_ref = None, counts_ref = None, study_controls = None, study_conditions = None, can_control_be_substring = True, remove_unrelated_counts = False)

Params:

score_ref: A pandas table that adheres to the scores table template.
sequence_ref: A pandas table that adheres to the sequence table template (default: None).
counts_ref: A pandas table that adheres to the counts table template (default: None).
study_controls: A list of control targets of the sgRNAs (default: None).
study_conditions: A list of two lists; first list contains the replicate names of initial time point, and second list contains the same for final time point (default: None).
can_control_be_substring: Can the controls be a substring of gene targets (in case of possible name conventions: default: True)
remove_unrelated_counts = Remove dual counts with targets that are outside of supplied scores targets? (default: False)

Returns:

A dictionary of three items:
- scores_ref: Contains the procesed scores table (if supplied)
- sequences_ref: Contains the procesed sequences table (if supplied)
- counts_ref: Contains the procesed counts table (if supplied)

insert_study_to_db

Inserts the counts to the designated DB.

SLKB.insert_study_to_db(SLKB_engine, db_inserts)

Params:

SLKB_engine: SQLAlchemy engine link
db_inserts: Processed data, obtained via prepare_study_for_export

Returns:

None

Scoring Functions

Median-B/NB Score

Calculates Median B/NB Scores.

median_res = SLKB.run_median_scores(curr_counts, curr_study, curr_cl, full_normalization = False, re_run = False, store_loc = os.getcwd(), save_dir = 'MEDIAN_Files')

Params:

curr_counts: Counts to calculate scores to.)
curr_study: String, name of study to analyze data for.
curr_cl: String, name of cell line to analyze data for.
full_normalization: Whether to normalize counts across the whole sample or according to target type (Default: False)
re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)
store_loc: String: Directory to store the Median files to. (Default: current working directory)
save_dir: String: Folder name to store the Median files to. (Default: 'MEDIAN_Files')

Returns:

median_res: A dictionary of two pandas dataframes: Median-B and Median-NB.

sgRNA-Derived-B/NB Score

Calculates sgRNA Derived N/NB scores.

sgRNA_res = SLKB.run_sgrna_scores(curr_counts, curr_study, curr_cl, full_normalization = False, re_run = False, store_loc = os.getcwd(), save_dir = 'sgRNA-DERIVED_Files')

Params:

curr_counts: Counts to calculate scores to.)
curr_study: String, name of study to analyze data for.
curr_cl: String, name of cell line to analyze data for.
full_normalization: Whether to normalize counts across the whole sample or according to target type (Default: False)
re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)
store_loc: String: Directory to store the sgRNA-Derived files to. (Default: current working directory)
save_dir: String: Folder name to store the sgRNA-Derived files to. (Default: 'sgRNA-DERIVED_Files')

Returns:

sgRNA_res: A dictionary of two pandas dataframes: sgRNA_derived_B and sgRNA_derived_NB.

MAGeCK Score

Calculates MAGeCK Score. Score files will created at the designated store location and save directory.

mageck_res = SLKB.run_mageck_score(curr_counts.copy(), curr_study, curr_cl, store_loc = os.getcwd(), save_dir = 'MAGECK_Files', command_line_params = [],re_run = False)

Params:

curr_counts: Counts to calculate scores to.)
curr_study: String, name of study to analyze data for.
curr_cl: String, name of cell line to analyze data for.
store_loc: String: Directory to store the MAGeCK files to. (Default: current working directory)
save_dir: String: Folder name to store the MAGeCK files to. (Default: 'MAGECK_Files')
command_line_params: Optional list to load programming environment(s) to be able to run mageck tool (i.e. loading path, activating python environment).
re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)

Returns:

mageck_res: A dict that contains a pandas dataframe for MAGeCK Score.

Horlbeck Score

Calculates Horlbeck score. Score files will created at the designated store location and save directory.

horlbeck_res = SLKB.run_horlbeck_score(curr_counts.copy(), curr_study = curr_study, curr_cl = curr_cl, store_loc = os.getcwd(), save_dir = 'HORLBECK_Files', do_preprocessing = True, re_run = False)

Params:

curr_counts: Counts to calculate scores to.)
curr_study: String, name of study to analyze data for.
curr_cl: String, name of cell line to analyze data for.
store_loc: String: Directory to store the Horlbeck files to. (Default: current working directory)
save_dir: String: Folder name to store the Horlbeck files to. (Default: 'Horlbeck_Files')
do_preprocessing: Boolean. Run Horlbeck preprocessing (Default: True)
re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)

Returns:

horlbeck_res: A dict that contains a pandas dataframe for Horlbeck Score.

GEMINI Score

Calculates GEMINI Score. Score files will created at the designated store location and save directory.

gemini_res = run_gemini_score(curr_counts.copy(), curr_study = curr_study, curr_cl = curr_cl, store_loc = os.getcwd(), save_dir = 'GEMINI_Files', command_line_params = cmd_params, re_run = False)

Params:

curr_counts: Counts to calculate scores to.)
curr_study: String, name of study to analyze data for.
curr_cl: String, name of cell line to analyze data for.
store_loc: String: Directory to store the GEMINI files to. (Default: current working directory)
save_dir: String: Folder name to store the GEMINI files to. (Default: 'GEMINI_Files')
command_line_params: Optional list to load programming environment(s) to be able to run GEMINI through R (i.e. loading path, activating R environment).
re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)

Returns:

gemini_res: A dict that contains a pandas dataframe for GEMINI Score.

check_if_added_to_table

If running the scoring methods multiple times, the method may be useful in skipping over the computation if there are gene pair records already in the database.

SLKB.check_if_added_to_table(curr_counts, score_name, SLKB_engine)

Params:

curr_counts: Counts to calculate the scores to.
score_name: Table to insert the scores to. Must be any of the 7 scoring table names:
- horlbeck_score
- median_b_score
- median_nb_score
- sgrna_derived_b_score
- sgrna_derived_nb_score
- gemini_score
- mageck_score

Returns:

Boolean. True if records are inserted into the DB, False otherwise.

query_results_table

Obtain SL Scores from the specified scoring table.

result = SLKB.query_result_table(curr_counts, table_name, curr_study, curr_cl, engine_link)

Params:

curr_counts: Counts to obtain the scores from.
table_name: Must be any of the 7 scoring table names:
- horlbeck_score
- median_b_score
- median_nb_score
- sgrna_derived_b_score
- sgrna_derived_nb_score
- gemini_score
- mageck_score
curr_study: String, name of the study to obtain the results for.
curr_cl: String, name of the cell line to obtain the results for.
engine_link: SQLAlchemy connection for the database.

Returns:

result: A pandas dataframe of the inserted results. Includes annotations for gene pair, study origin, and cell line origin.