graph_data |
Configuration for graph data. |
graph |
Graph configuration for pipeline |
evaluation_data |
Input data for pipeline. |
source_data |
Compressed input data for pipeline (tar.gz). |
files |
None |
path |
File path or url. |
desc |
File description. |
train_data |
Positive and negative graph data for ML training. |
valid_data |
Positive and negative graph data for ML validation. |
pos_edge_filepath |
Positive edges file path. |
neg_edge_filepath |
Negative edges file path. |
filename |
Embeddings file name. |
history_filename |
Embeddings history file name. |
node_embeddings_params |
Node embeddings parameters. |
tsne_filename |
File name for the TSNE plot. |
method_name |
Name of the node embedding method. |
walk_length |
Maximal length of the walks. |
batch_size |
Number of nodes to include in a single batch. |
window_size |
Size of the context and target nodes for node2vec. |
return_weight |
Weight on the probability of returning to the same node the walk just came from Having this higher tends the walks to be more like a Breadth-First Search. Having this very high (> 2) makes search very local. Equal to the inverse of p in the Node2Vec paper. |
explore_weight |
Weight on the probability of visiting a neighbor node to the one we're coming from in the random walk Having this higher tends the walks to be more like a Depth-First Search. Having this very high makes search more outward. Having this very low makes search very local. Equal to the inverse of q in the Node2Vec paper. |
iterations |
Number of iterations. |
classifiers |
Classifier details. |
classifier_id |
Key to identify the classifier and associated parameters. |
classifier_name |
Name of the classifier. |
classifier_type |
Type of classifier. |
edge_method |
Edge method. |
outfile |
Fie path for saving output. |
parameters |
Parameters to be passed for building classifier. |
sklearn_params |
Parameters specific to sklearn. |
tf_keras_params |
Parameters specific to Tensorflow/Keras |
random_state |
Random seed. |
max_iter |
Maximum iterations. |
layers_config |
Configuration for instantiating layers for neural networks. |
loss |
Loss function. |
metrics_config |
Metrics to be calculated after classifier training. |
optimizer |
Optimizer function to be used during classifier training. |
fit_config |
Configuration for model fitting. |
layers |
Container of layers to be used to build the neural network. |
type |
Type of layer. |
units |
None |
activation |
Activation layer type |
rate |
None |
metrics |
Metrics need to train a classifier. |
name |
None |
curve |
Area under curve (AUC) to be calculated. |
epochs |
Number of epochs to run for training. |
callbacks_list |
Container of callbacks. |
callbacks |
Callbacks. |
monitor |
Quantity to be monitored. |
patience |
Number of epochs with no improvement after which training will be stopped. |
min_delta |
Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement. |
verbose |
Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays messages when the callback takes an action. |
mode |
One of {"auto", "min", "max"}. In min mode, training will stop when the quantity monitored has stopped decreasing; in "max" mode it will stop when the quantity monitored has stopped increasing; in "auto" mode, the direction is automatically inferred from the name of the monitored quantity. |
models |
Models that need to be used for link prediction. |
model_id |
Key of the model to be used. |
node_types |
Type of nodes. |
cutoff |
Cutoff filter. |
source |
Source node(s). |
destination |
Destination node(s). |
s3_bucket |
Bucket name. |
s3_bucket_dir |
Bucket path. |
extra_args |
Extra keyword arguments (**kwargs). |
target_path |
File path for saving results. |
node_type_path |
The path to the file with the unique node type names. |
node_type_list_separator |
The separator to use for the node types file. |
node_types_column_number |
Node type column number. |
node_types_column |
Node type column name. |
node_types_ids_column_number |
Node type ID column number. |
node_types_ids_column |
Node type ID column name. |
node_types_number |
The number of the unique node types. This will be used in order to allocate the correct size for the data structure. |
numeric_node_type_ids |
Whether the node type names should be loaded as numeric values, i.e. casted from string to a numeric representation. |
minimum_node_type_id |
The minimum node type ID to be used when using numeric node type IDs. |
node_type_list_header |
Whether the node type file has an header. |
node_type_list_support_balanced_quotes |
Whether to support balanced quotes. |
node_type_list_rows_to_skip |
The number of lines to skip in the node types file. The header is already skipped if it has been specified that the file has an header. |
node_type_list_is_correct |
Whether the node types file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash. |
node_type_list_max_rows_number |
The maximum number of lines to be loaded from the node types file. |
node_type_list_comment_symbol |
The comment symbol to skip lines in the node types file. Lines starting with this symbol will be skipped. |
load_node_type_list_in_parallel |
Whether to load the node type list in parallel. Note that when loading in parallel, the internal order of the node type IDs may result changed across different iterations. We are working to get this to be stable. |
node_path |
The path to the file with the unique node names. |
node_list_separator |
The separator to use for the nodes file. Note that if this is not provided, one will be automatically detected among the following - comma, semi-column, tab and space. |
node_list_header |
Whether the nodes file has an header. |
node_list_support_balanced_quotes |
Whether to support balanced quotes. |
node_list_rows_to_skip |
Number of rows to skip in the node list file. |
node_list_is_correct |
Whether the nodes file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash. |
node_list_max_rows_number |
The maximum number of lines to be loaded from the nodes file. |
node_list_comment_symbol |
The comment symbol to skip lines in the nodes file. Lines starting with this symbol will be skipped. |
default_node_type |
The node type to be used when the node type for a given node in the node file is None. |
nodes_column_number |
The number of the column of the node file from where to load the node names. |
nodes_column |
The name of the column of the node file from where to load the node names. |
node_types_separator |
The node types separator. |
node_list_node_types_column_number |
The number of the column of the node file from where to load the node types. |
node_list_node_types_column |
The name of the column of the node file from where to load the node types. |
node_ids_column |
The name of the column of the node file from where to load the node IDs. |
node_ids_column_number |
The number of the column of the node file from where to load the node IDs |
nodes_number |
The expected number of nodes. Note that this must be the EXACT number of nodes in the graph. |
minimum_node_id |
The minimum node ID to be used, when loading the node IDs as numerical. |
numeric_node_ids |
Whether to load the numeric node IDs as numeric. |
node_list_numeric_node_type_ids |
Whether to load the node types IDs in the node file to be numeric. |
skip_node_types_if_unavailable |
Whether to skip the node types without raising an error if these are unavailable. |
load_node_list_in_parallel |
Whether to load the node list in parallel. When loading in parallel, without node IDs, the nodes may not be loaded in a deterministic order. |
edge_type_path |
The path to the file with the unique edge type names. |
edge_types_column_number |
The number of the column of the edge types file from where to load the edge types. |
edge_types_column |
The name of the column of the edge types file from where to load the edge types. |
edge_types_number |
The number of the unique edge types. This will be used in order to allocate the correct size for the data structure. |
numeric_edge_type_ids |
Whether the edge type names should be loaded as numeric values, i.e. casted from string to a numeric representation. |
minimum_edge_type_id |
The minimum edge type ID to be used when using numeric edge type IDs. |
edge_type_list_separator |
The separator to use for the edge type list. Note that, if None is provided, one will be attempted to be detected automatically between ';', ',', tab or space. |
edge_type_list_header |
Whether the edge type file has an header. |
edge_type_list_support_balanced_quotes |
Whether to support balanced quotes while reading the edge type list. |
edge_type_list_rows_to_skip |
Number of rows to skip in the edge type list file. |
edge_type_list_is_correct |
Whether the edge types file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash. |
edge_type_list_max_rows_number |
The maximum number of lines to be loaded from the edge types file. |
edge_type_list_comment_symbol |
The comment symbol to skip lines in the edge types file. Lines starting with this symbol will be skipped. |
load_edge_type_list_in_parallel |
Whether to load the edge type list in parallel. When loading in parallel, without edge type IDs, the edge types may not be loaded in a deterministic order. |
edge_path |
The path to the file with the edge list. |
edge_list_separator |
The separator to use for the edge list. Note that, if None is provided, one will be attempted to be detected automatically between ';', ',', tab or space. |
edge_list_header |
Whether the edges file has an header. |
edge_list_support_balanced_quotes |
Whether to support balanced quotes while reading the edge list. |
edge_list_rows_to_skip |
Number of rows to skip in the edge list file. |
sources_column_number |
The number of the column of the edges file from where to load the source nodes. |
sources_column |
The name of the column of the edges file from where to load the source nodes. |
destinations_column_number |
The number of the column of the edges file from where to load the destinaton nodes. |
destinations_column |
The name of the column of the edges file from where to load the destinaton nodes. |
edge_list_edge_types_column_number |
The number of the column of the edges file from where to load the edge types. |
edge_list_edge_types_column |
The name of the column of the edges file from where to load the edge types. |
default_edge_type |
The edge type to be used when the edge type for a given edge in the edge file is None. |
weights_column_number |
The number of the column of the edges file from where to load the edge weights. |
weights_column |
The name of the column of the edges file from where to load the edge weights. |
default_weight |
The edge weight to be used when the edge weight for a given edge in the edge file is None. |
edge_ids_column |
The name of the column of the edges file from where to load the edge IDs. |
edge_ids_column_number |
The number of the column of the edges file from where to load the edge IDs. |
edge_list_numeric_edge_type_ids |
Whether to load the edge type IDs as numeric from the edge list. |
edge_list_numeric_node_ids |
Whether to load the edge node IDs as numeric from the edge list. |
skip_weights_if_unavailable |
Whether to skip the weights without raising an error if these are unavailable. |
skip_edge_types_if_unavailable |
Whether to skip the edge types without raising an error if these are unavailable. |
edge_list_is_complete |
Whether to consider the edge list as complete, i.e. the edges are presented in both directions when loading an undirected graph. |
edge_list_may_contain_duplicates |
Whether the edge list may contain duplicates. If the edge list surely DOES NOT contain duplicates, a validation step may be skipped. By default, it is assumed that the edge list may contain duplicates. |
edge_list_is_sorted |
Whether the edge list is sorted. Note that a sorted edge list has the minimal memory peak, but requires the nodes number and the edges number. |
edge_list_is_correct |
Whether the edges file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash. |
edge_list_max_rows_number |
The maximum number of lines to be loaded from the edges file. |
edge_list_comment_symbol |
The comment symbol to skip lines in the edges file. Lines starting with this symbol will be skipped. |
edges_number |
The expected number of edges. Note that this must be the EXACT number of edges in the graph. |
load_edge_list_in_parallel |
Whether to load the edge list in parallel. Note that, if the edge IDs indices are not given, it is NOT possible to load a sorted edge list. Similarly, when loading in parallel, without edge IDs, the edges may not be loaded in a deterministic order. |
may_have_singletons |
Whether the graph may be expected to have singleton nodes. If it is said that it surely DOES NOT have any, it will allow for some speedups and lower mempry peaks. |
may_have_singleton_with_selfloops |
Whether the graph may be expected to have singleton nodes with selfloops. If it is said that it surely DOES NOT have any, it will allow for some speedups and lower mempry peaks. |
directed |
Whether to load the graph as directed or undirected. |