kg_covid_19.utils package

Submodules

kg_covid_19.utils.download_utils module

kg_covid_19.utils.download_utils.download_from_api(yaml_item, outfile) → None
Args:

yaml_item: item to be download, parsed from yaml outfile: where to write out file

Returns:

kg_covid_19.utils.download_utils.download_from_yaml(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None

Given an download info from an download.yaml file, download all files

Args:

yaml_file: A string pointing to the download.yaml file, to be parsed for things to download. output_dir: A string pointing to where to write out downloaded files. ignore_cache: Ignore cache and download files even if they exist [false]

Returns:

None.

kg_covid_19.utils.download_utils.elastic_search_query(es_connection, index, query, scroll: str = '1m', request_timeout: int = 60, preserve_order: bool = True)

Fetch records from the given URL and query parameters.

Args:

es_connection: elastic search connection index: the elastic search index for query query: query scroll: scroll parameter passed to elastic search request_timeout: timeout parameter passed to elastic search preserve_order: preserve order param passed to elastic search

Returns:

All records for query

kg_covid_19.utils.transform_utils module

exception kg_covid_19.utils.transform_utils.ItemInDictNotFound

Bases: kg_covid_19.utils.transform_utils.TransformError

Raised when the input value is too small

exception kg_covid_19.utils.transform_utils.TransformError

Bases: Exception

Base class for other exceptions

kg_covid_19.utils.transform_utils.collapse_uniprot_curie(uniprot_curie: str) → str

Given a UniProtKB curie for an isoform such as UniprotKB:P63151-1 or UniprotKB:P63151-2, collapse to parent protein (UniprotKB:P63151 / UniprotKB:P63151)

Parameters

uniprot_curie

Returns

collapsed UniProtKB ID

kg_covid_19.utils.transform_utils.data_to_dict(these_keys, these_values) → dict

Zip up two lists to make a dict

Parameters
  • these_keys – keys for new dict

  • these_values – values for new dict

Returns

dictionary

kg_covid_19.utils.transform_utils.get_header_items(table_data: Any) → List

Utility fxn to get header from (first page of) a table.

Args:

table_data: Data, as list of dicts from tabula.io.read_pdf().

Returns:

header_items: An array of header items.

kg_covid_19.utils.transform_utils.get_item_by_priority(items_dict: dict, keys_by_priority: list) → str

Retrieve item from a dict using a list of keys, in descending order of priority

Parameters
  • items_dict

  • keys_by_priority – list of keys to use to find values

Returns

str: first value in dict for first item in keys_by_priority

that isn’t blank, or None

kg_covid_19.utils.transform_utils.guess_bl_category(identifier: str) → str

Guess category for a given identifier.

Note: This is a temporary solution and should not be used long term.

Args:

identifier: A CURIE

Returns:

The category for the given CURIE

kg_covid_19.utils.transform_utils.multi_page_table_to_list(multi_page_table: Any) → List[Dict]

Method to turn table data returned from tabula.io.read_pdf(), possibly broken over several pages, into a list of dicts, one dict for each row.

Args:

multi_page_table:

Returns:

table_data: A list of dicts, where each dict is item from one row.

kg_covid_19.utils.transform_utils.parse_header(header_string: str, sep: str = '\t') → List

Parses header data.

Args:

header_string: A string containing header items. sep: A string containing a delimiter.

Returns:

A list of header items.

kg_covid_19.utils.transform_utils.ungzip_to_tempdir(gzipped_file: str, tempdir: str) → str
kg_covid_19.utils.transform_utils.uniprot_make_name_to_id_mapping(dat_gz_file: str) → dict

Given a Uniprot dat.gz file, like this: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping.dat.gz

makes dict with name to id mapping

Parameters

dat_gz_file

Returns

dict with mapping

kg_covid_19.utils.transform_utils.uniprot_name_to_id(name_to_id_map: dict, name: str) → Optional[str]

Uniprot name to ID mapping

Parameters
  • name_to_id_map – mapping dict[name] -> id

  • name – name

Returns

id string, or None

kg_covid_19.utils.transform_utils.unzip_to_tempdir(zip_file_name: str, tempdir: str) → None
kg_covid_19.utils.transform_utils.write_node_edge_item(fh: Any, header: List, data: List, sep: str = '\t')

Write out a single line for a node or an edge in *.tsv :param fh: file handle of node or edge file :param header: list of header items :param data: data for line to write out :param sep: separator [ ]

Module contents

kg_covid_19.utils.download_from_yaml(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None

Given an download info from an download.yaml file, download all files

Args:

yaml_file: A string pointing to the download.yaml file, to be parsed for things to download. output_dir: A string pointing to where to write out downloaded files. ignore_cache: Ignore cache and download files even if they exist [false]

Returns:

None.

kg_covid_19.utils.multi_page_table_to_list(multi_page_table: Any) → List[Dict]

Method to turn table data returned from tabula.io.read_pdf(), possibly broken over several pages, into a list of dicts, one dict for each row.

Args:

multi_page_table:

Returns:

table_data: A list of dicts, where each dict is item from one row.

kg_covid_19.utils.write_node_edge_item(fh: Any, header: List, data: List, sep: str = '\t')

Write out a single line for a node or an edge in *.tsv :param fh: file handle of node or edge file :param header: list of header items :param data: data for line to write out :param sep: separator [ ]