Skip to content

neat_ml_schema

A schema generated by LinkML for specifying graph ML pipelines in NEAT.

URI: https://w3id.org/neat-ml

Classes

Class Description
NeatConfiguration Root class for the schema.
GraphDataConfiguration Configuration for the graph training and validation data for ML pipeline.
FileResourceContainer A container of paths and descriptions
FileResource Path (or url) and description of source files (tar.gz).
TrainValidData Postive and negative training and validation graph data filepaths.
PosNegData File paths for positive an negative edge data.
EmbeddingsConfig Embeddings configuration.
NodeEmbeddingsParams Node embeddings parameters.
ClassifierContainer A container with multiple classifiers in it.
Classifier ML classifiers to be trained-tested-validated and applied for predictions.
ClassifierParams Parameters needed to define a classifier.
SkLearnParams Parameters specific to the SKLearn library.
TFKerasParams Parameters specific to the Tensorflow Keras library.
LayerContainer A container of neural network layers.
Layer Layers for a neural network.
LayerParams Parameters for each layer of a neural network.
MetricContainer Container of metrics for a trained classifier.
Metric Metrics of a trained classifier (model).
ClassifierFitParams Paramters for fitting a classifier.
ClassifierCallbackContainer A container of classifier callbacks.
ClassifierCallback Utilities called at certain points during model training.
ApplyTrainedModelsContainer A container with multiple classifiers in it.
ApplyTrainedModel Trained model used for prediction.
NodeType Source node or Destination node.
Upload Configuration for uploading to Amazon S3 bucket.
Target Path for output to be saved.
EnsmallenRunConfig All params used by ensmallen's csv_reader

Slots

Slot Description
graph_data Configuration for graph data.
graph Graph configuration for pipeline
evaluation_data Input data for pipeline.
source_data Compressed input data for pipeline (tar.gz).
files None
path File path or url.
desc File description.
train_data Positive and negative graph data for ML training.
valid_data Positive and negative graph data for ML validation.
pos_edge_filepath Positive edges file path.
neg_edge_filepath Negative edges file path.
filename Embeddings file name.
history_filename Embeddings history file name.
node_embeddings_params Node embeddings parameters.
tsne_filename File name for the TSNE plot.
method_name Name of the node embedding method.
walk_length Maximal length of the walks.
batch_size Number of nodes to include in a single batch.
window_size Size of the context and target nodes for node2vec.
return_weight Weight on the probability of returning to the same node the walk just came from Having this higher tends the walks to be more like a Breadth-First Search. Having this very high (> 2) makes search very local. Equal to the inverse of p in the Node2Vec paper.
explore_weight Weight on the probability of visiting a neighbor node to the one we're coming from in the random walk Having this higher tends the walks to be more like a Depth-First Search. Having this very high makes search more outward. Having this very low makes search very local. Equal to the inverse of q in the Node2Vec paper.
iterations Number of iterations.
classifiers Classifier details.
classifier_id Key to identify the classifier and associated parameters.
classifier_name Name of the classifier.
classifier_type Type of classifier.
edge_method Edge method.
outfile Fie path for saving output.
parameters Parameters to be passed for building classifier.
sklearn_params Parameters specific to sklearn.
tf_keras_params Parameters specific to Tensorflow/Keras
random_state Random seed.
max_iter Maximum iterations.
layers_config Configuration for instantiating layers for neural networks.
loss Loss function.
metrics_config Metrics to be calculated after classifier training.
optimizer Optimizer function to be used during classifier training.
fit_config Configuration for model fitting.
layers Container of layers to be used to build the neural network.
type Type of layer.
units None
activation Activation layer type
rate None
metrics Metrics need to train a classifier.
name None
curve Area under curve (AUC) to be calculated.
epochs Number of epochs to run for training.
callbacks_list Container of callbacks.
callbacks Callbacks.
monitor Quantity to be monitored.
patience Number of epochs with no improvement after which training will be stopped.
min_delta Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
verbose Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays messages when the callback takes an action.
mode One of {"auto", "min", "max"}. In min mode, training will stop when the quantity monitored has stopped decreasing; in "max" mode it will stop when the quantity monitored has stopped increasing; in "auto" mode, the direction is automatically inferred from the name of the monitored quantity.
models Models that need to be used for link prediction.
model_id Key of the model to be used.
node_types Type of nodes.
cutoff Cutoff filter.
source Source node(s).
destination Destination node(s).
s3_bucket Bucket name.
s3_bucket_dir Bucket path.
extra_args Extra keyword arguments (**kwargs).
target_path File path for saving results.
node_type_path The path to the file with the unique node type names.
node_type_list_separator The separator to use for the node types file.
node_types_column_number Node type column number.
node_types_column Node type column name.
node_types_ids_column_number Node type ID column number.
node_types_ids_column Node type ID column name.
node_types_number The number of the unique node types. This will be used in order to allocate the correct size for the data structure.
numeric_node_type_ids Whether the node type names should be loaded as numeric values, i.e. casted from string to a numeric representation.
minimum_node_type_id The minimum node type ID to be used when using numeric node type IDs.
node_type_list_header Whether the node type file has an header.
node_type_list_support_balanced_quotes Whether to support balanced quotes.
node_type_list_rows_to_skip The number of lines to skip in the node types file. The header is already skipped if it has been specified that the file has an header.
node_type_list_is_correct Whether the node types file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
node_type_list_max_rows_number The maximum number of lines to be loaded from the node types file.
node_type_list_comment_symbol The comment symbol to skip lines in the node types file. Lines starting with this symbol will be skipped.
load_node_type_list_in_parallel Whether to load the node type list in parallel. Note that when loading in parallel, the internal order of the node type IDs may result changed across different iterations. We are working to get this to be stable.
node_path The path to the file with the unique node names.
node_list_separator The separator to use for the nodes file. Note that if this is not provided, one will be automatically detected among the following - comma, semi-column, tab and space.
node_list_header Whether the nodes file has an header.
node_list_support_balanced_quotes Whether to support balanced quotes.
node_list_rows_to_skip Number of rows to skip in the node list file.
node_list_is_correct Whether the nodes file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
node_list_max_rows_number The maximum number of lines to be loaded from the nodes file.
node_list_comment_symbol The comment symbol to skip lines in the nodes file. Lines starting with this symbol will be skipped.
default_node_type The node type to be used when the node type for a given node in the node file is None.
nodes_column_number The number of the column of the node file from where to load the node names.
nodes_column The name of the column of the node file from where to load the node names.
node_types_separator The node types separator.
node_list_node_types_column_number The number of the column of the node file from where to load the node types.
node_list_node_types_column The name of the column of the node file from where to load the node types.
node_ids_column The name of the column of the node file from where to load the node IDs.
node_ids_column_number The number of the column of the node file from where to load the node IDs
nodes_number The expected number of nodes. Note that this must be the EXACT number of nodes in the graph.
minimum_node_id The minimum node ID to be used, when loading the node IDs as numerical.
numeric_node_ids Whether to load the numeric node IDs as numeric.
node_list_numeric_node_type_ids Whether to load the node types IDs in the node file to be numeric.
skip_node_types_if_unavailable Whether to skip the node types without raising an error if these are unavailable.
load_node_list_in_parallel Whether to load the node list in parallel. When loading in parallel, without node IDs, the nodes may not be loaded in a deterministic order.
edge_type_path The path to the file with the unique edge type names.
edge_types_column_number The number of the column of the edge types file from where to load the edge types.
edge_types_column The name of the column of the edge types file from where to load the edge types.
edge_types_number The number of the unique edge types. This will be used in order to allocate the correct size for the data structure.
numeric_edge_type_ids Whether the edge type names should be loaded as numeric values, i.e. casted from string to a numeric representation.
minimum_edge_type_id The minimum edge type ID to be used when using numeric edge type IDs.
edge_type_list_separator The separator to use for the edge type list. Note that, if None is provided, one will be attempted to be detected automatically between ';', ',', tab or space.
edge_type_list_header Whether the edge type file has an header.
edge_type_list_support_balanced_quotes Whether to support balanced quotes while reading the edge type list.
edge_type_list_rows_to_skip Number of rows to skip in the edge type list file.
edge_type_list_is_correct Whether the edge types file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
edge_type_list_max_rows_number The maximum number of lines to be loaded from the edge types file.
edge_type_list_comment_symbol The comment symbol to skip lines in the edge types file. Lines starting with this symbol will be skipped.
load_edge_type_list_in_parallel Whether to load the edge type list in parallel. When loading in parallel, without edge type IDs, the edge types may not be loaded in a deterministic order.
edge_path The path to the file with the edge list.
edge_list_separator The separator to use for the edge list. Note that, if None is provided, one will be attempted to be detected automatically between ';', ',', tab or space.
edge_list_header Whether the edges file has an header.
edge_list_support_balanced_quotes Whether to support balanced quotes while reading the edge list.
edge_list_rows_to_skip Number of rows to skip in the edge list file.
sources_column_number The number of the column of the edges file from where to load the source nodes.
sources_column The name of the column of the edges file from where to load the source nodes.
destinations_column_number The number of the column of the edges file from where to load the destinaton nodes.
destinations_column The name of the column of the edges file from where to load the destinaton nodes.
edge_list_edge_types_column_number The number of the column of the edges file from where to load the edge types.
edge_list_edge_types_column The name of the column of the edges file from where to load the edge types.
default_edge_type The edge type to be used when the edge type for a given edge in the edge file is None.
weights_column_number The number of the column of the edges file from where to load the edge weights.
weights_column The name of the column of the edges file from where to load the edge weights.
default_weight The edge weight to be used when the edge weight for a given edge in the edge file is None.
edge_ids_column The name of the column of the edges file from where to load the edge IDs.
edge_ids_column_number The number of the column of the edges file from where to load the edge IDs.
edge_list_numeric_edge_type_ids Whether to load the edge type IDs as numeric from the edge list.
edge_list_numeric_node_ids Whether to load the edge node IDs as numeric from the edge list.
skip_weights_if_unavailable Whether to skip the weights without raising an error if these are unavailable.
skip_edge_types_if_unavailable Whether to skip the edge types without raising an error if these are unavailable.
edge_list_is_complete Whether to consider the edge list as complete, i.e. the edges are presented in both directions when loading an undirected graph.
edge_list_may_contain_duplicates Whether the edge list may contain duplicates. If the edge list surely DOES NOT contain duplicates, a validation step may be skipped. By default, it is assumed that the edge list may contain duplicates.
edge_list_is_sorted Whether the edge list is sorted. Note that a sorted edge list has the minimal memory peak, but requires the nodes number and the edges number.
edge_list_is_correct Whether the edges file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
edge_list_max_rows_number The maximum number of lines to be loaded from the edges file.
edge_list_comment_symbol The comment symbol to skip lines in the edges file. Lines starting with this symbol will be skipped.
edges_number The expected number of edges. Note that this must be the EXACT number of edges in the graph.
load_edge_list_in_parallel Whether to load the edge list in parallel. Note that, if the edge IDs indices are not given, it is NOT possible to load a sorted edge list. Similarly, when loading in parallel, without edge IDs, the edges may not be loaded in a deterministic order.
may_have_singletons Whether the graph may be expected to have singleton nodes. If it is said that it surely DOES NOT have any, it will allow for some speedups and lower mempry peaks.
may_have_singleton_with_selfloops Whether the graph may be expected to have singleton nodes with selfloops. If it is said that it surely DOES NOT have any, it will allow for some speedups and lower mempry peaks.
directed Whether to load the graph as directed or undirected.

Enums

Enums Description
NodeEmbedMethodEnum Enums containing possible values for node embedding methods.
EdgeMethodEnum Enums containing possible values for node edge methods.
ActivationEnum Enums containing possible values for activation functions.
OptimizerEnum Optimizers that can be implemented in the neural network.
ClassifierCallbackModeEnum Callback modes while fitting a classifier.