API

File Formats

A demo data is available for loading. Additional details can be found in the pipeline.

demo_data = SLKB.load_demo_data()

Params:

  • None.

Returns:

  • demo_data. A list of 3 items: sequence file, counts, fle, and score file.

Functions For Data Insertion to KB

generate_database

Creates a sqlite3 or mysql database, using SLKB schema.

SLKB.create_SLKB(engine = 'sqlite:///SLKB_sqlite3', db_type = 'sqlite3')

Params:

  • engine: sqlalchemy url object. (Default: sqlite:///SLKB_sqlite3)
  • db_type: Type of database to use schema for, currently available in mysql and sqlite3. (Default: sqlite3)

Returns:

  • None.

extract_SLKB_webapp

Extracts the SLKB webapp to the specified location.

SLKB.extract_SLKB_webapp(location = os.getcwd())

Params:

  • location: Location to extract SLKB files. (Default: Current working directory)

Returns:

  • None.

prepare_study_for_export

Prepares the counts, scores, and sequences files for insertion into the DB.

db_inserts = SLKB.prepare_study_for_export(score_ref, sequence_ref = None, counts_ref = None, study_controls = None, study_conditions = None, can_control_be_substring = True, remove_unrelated_counts = False)

Params:

  • score_ref: A pandas table that adheres to the scores table template.
  • sequence_ref: A pandas table that adheres to the sequence table template (default: None).
  • counts_ref: A pandas table that adheres to the counts table template (default: None).
  • study_controls: A list of control targets of the sgRNAs (default: None).
  • study_conditions: A list of two lists; first list contains the replicate names of initial time point, and second list contains the same for final time point (default: None).
  • can_control_be_substring: Can the controls be a substring of gene targets (in case of possible name conventions: default: True)
  • remove_unrelated_counts = Remove dual counts with targets that are outside of supplied scores targets? (default: False)

Returns:

  • A dictionary of three items:
    • scores_ref: Contains the procesed scores table (if supplied)
    • sequences_ref: Contains the procesed sequences table (if supplied)
    • counts_ref: Contains the procesed counts table (if supplied)

insert_study_to_db

Inserts the counts to the designated DB.

SLKB.insert_study_to_db(SLKB_engine, db_inserts)

Params:

  • SLKB_engine: SQLAlchemy engine link
  • db_inserts: Processed data, obtained via prepare_study_for_export

Returns:

  • None

Scoring Functions

Median-B/NB Score

Calculates Median B/NB Scores.

median_res = SLKB.run_median_scores(curr_counts, curr_study, curr_cl, full_normalization = False, re_run = False, store_loc = os.getcwd(), save_dir = 'MEDIAN_Files')

Params:

  • curr_counts: Counts to calculate scores to.)
  • curr_study: String, name of study to analyze data for.
  • curr_cl: String, name of cell line to analyze data for.
  • full_normalization: Whether to normalize counts across the whole sample or according to target type (Default: False)
  • re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)
  • store_loc: String: Directory to store the Median files to. (Default: current working directory)
  • save_dir: String: Folder name to store the Median files to. (Default: 'MEDIAN_Files')

Returns:

  • median_res: A dictionary of two pandas dataframes: Median-B and Median-NB.

sgRNA-Derived-B/NB Score

Calculates sgRNA Derived N/NB scores.

sgRNA_res = SLKB.run_sgrna_scores(curr_counts, curr_study, curr_cl, full_normalization = False, re_run = False, store_loc = os.getcwd(), save_dir = 'sgRNA-DERIVED_Files')

Params:

  • curr_counts: Counts to calculate scores to.)
  • curr_study: String, name of study to analyze data for.
  • curr_cl: String, name of cell line to analyze data for.
  • full_normalization: Whether to normalize counts across the whole sample or according to target type (Default: False)
  • re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)
  • store_loc: String: Directory to store the sgRNA-Derived files to. (Default: current working directory)
  • save_dir: String: Folder name to store the sgRNA-Derived files to. (Default: 'sgRNA-DERIVED_Files')

Returns:

  • sgRNA_res: A dictionary of two pandas dataframes: sgRNA_derived_B and sgRNA_derived_NB.

MAGeCK Score

Calculates MAGeCK Score. Score files will created at the designated store location and save directory.

mageck_res = SLKB.run_mageck_score(curr_counts.copy(), curr_study, curr_cl, store_loc = os.getcwd(), save_dir = 'MAGECK_Files', command_line_params = [],re_run = False)   

Params:

  • curr_counts: Counts to calculate scores to.)
  • curr_study: String, name of study to analyze data for.
  • curr_cl: String, name of cell line to analyze data for.
  • store_loc: String: Directory to store the MAGeCK files to. (Default: current working directory)
  • save_dir: String: Folder name to store the MAGeCK files to. (Default: 'MAGECK_Files')
  • command_line_params: Optional list to load programming environment(s) to be able to run mageck tool (i.e. loading path, activating python environment).
  • re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)

Returns:

  • mageck_res: A dict that contains a pandas dataframe for MAGeCK Score.

Horlbeck Score

Calculates Horlbeck score. Score files will created at the designated store location and save directory.

horlbeck_res = SLKB.run_horlbeck_score(curr_counts.copy(), curr_study = curr_study, curr_cl = curr_cl, store_loc = os.getcwd(), save_dir = 'HORLBECK_Files', do_preprocessing = True, re_run = False)

Params:

  • curr_counts: Counts to calculate scores to.)
  • curr_study: String, name of study to analyze data for.
  • curr_cl: String, name of cell line to analyze data for.
  • store_loc: String: Directory to store the Horlbeck files to. (Default: current working directory)
  • save_dir: String: Folder name to store the Horlbeck files to. (Default: 'Horlbeck_Files')
  • do_preprocessing: Boolean. Run Horlbeck preprocessing (Default: True)
  • re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)

Returns:

  • horlbeck_res: A dict that contains a pandas dataframe for Horlbeck Score.

GEMINI Score

Calculates GEMINI Score. Score files will created at the designated store location and save directory.

gemini_res = run_gemini_score(curr_counts.copy(), curr_study = curr_study, curr_cl = curr_cl, store_loc = os.getcwd(), save_dir = 'GEMINI_Files', command_line_params = cmd_params, re_run = False)

Params:

  • curr_counts: Counts to calculate scores to.)
  • curr_study: String, name of study to analyze data for.
  • curr_cl: String, name of cell line to analyze data for.
  • store_loc: String: Directory to store the GEMINI files to. (Default: current working directory)
  • save_dir: String: Folder name to store the GEMINI files to. (Default: 'GEMINI_Files')
  • command_line_params: Optional list to load programming environment(s) to be able to run GEMINI through R (i.e. loading path, activating R environment).
  • re_run: Boolean. Recreate and rerun the results instead of loading for subsequent analyses (Default: False)

Returns:

  • gemini_res: A dict that contains a pandas dataframe for GEMINI Score.

check_if_added_to_table

If running the scoring methods multiple times, the method may be useful in skipping over the computation if there are gene pair records already in the database.

SLKB.check_if_added_to_table(curr_counts, score_name, SLKB_engine)

Params:

  • curr_counts: Counts to calculate the scores to.
  • score_name: Table to insert the scores to. Must be any of the 7 scoring table names:
    • horlbeck_score
    • median_b_score
    • median_nb_score
    • sgrna_derived_b_score
    • sgrna_derived_nb_score
    • gemini_score
    • mageck_score

Returns:

  • Boolean. True if records are inserted into the DB, False otherwise.

query_results_table

Obtain SL Scores from the specified scoring table.

result = SLKB.query_result_table(curr_counts, table_name, curr_study, curr_cl, engine_link)

Params:

  • curr_counts: Counts to obtain the scores from.
  • table_name: Must be any of the 7 scoring table names:
    • horlbeck_score
    • median_b_score
    • median_nb_score
    • sgrna_derived_b_score
    • sgrna_derived_nb_score
    • gemini_score
    • mageck_score
  • curr_study: String, name of the study to obtain the results for.
  • curr_cl: String, name of the cell line to obtain the results for.
  • engine_link: SQLAlchemy connection for the database.

Returns:

  • result: A pandas dataframe of the inserted results. Includes annotations for gene pair, study origin, and cell line origin.