neat_ml_schema

A schema generated by LinkML for specifying graph ML pipelines in NEAT.

URI: https://w3id.org/neat-ml

Classes

Class	Description
NeatConfiguration	Root class for the schema.
GraphDataConfiguration	Configuration for the graph training and validation data for ML pipeline.
FileResourceContainer	A container of paths and descriptions
FileResource	Path (or url) and description of source files (tar.gz).
TrainValidData	Postive and negative training and validation graph data filepaths.
PosNegData	File paths for positive an negative edge data.
EmbeddingsConfig	Embeddings configuration.
NodeEmbeddingsParams	Node embeddings parameters.
ClassifierContainer	A container with multiple classifiers in it.
Classifier	ML classifiers to be trained-tested-validated and applied for predictions.
ClassifierParams	Parameters needed to define a classifier.
SkLearnParams	Parameters specific to the SKLearn library.
TFKerasParams	Parameters specific to the Tensorflow Keras library.
LayerContainer	A container of neural network layers.
Layer	Layers for a neural network.
LayerParams	Parameters for each layer of a neural network.
MetricContainer	Container of metrics for a trained classifier.
Metric	Metrics of a trained classifier (model).
ClassifierFitParams	Paramters for fitting a classifier.
ClassifierCallbackContainer	A container of classifier callbacks.
ClassifierCallback	Utilities called at certain points during model training.
ApplyTrainedModelsContainer	A container with multiple classifiers in it.
ApplyTrainedModel	Trained model used for prediction.
NodeType	Source node or Destination node.
Upload	Configuration for uploading to Amazon S3 bucket.
Target	Path for output to be saved.
EnsmallenRunConfig	All params used by ensmallen's csv_reader

Slots

Slot	Description
graph_data	Configuration for graph data.
graph	Graph configuration for pipeline
evaluation_data	Input data for pipeline.
source_data	Compressed input data for pipeline (tar.gz).
files	None
path	File path or url.
desc	File description.
train_data	Positive and negative graph data for ML training.
valid_data	Positive and negative graph data for ML validation.
pos_edge_filepath	Positive edges file path.
neg_edge_filepath	Negative edges file path.
filename	Embeddings file name.
history_filename	Embeddings history file name.
node_embeddings_params	Node embeddings parameters.
tsne_filename	File name for the TSNE plot.
method_name	Name of the node embedding method.
walk_length	Maximal length of the walks.
batch_size	Number of nodes to include in a single batch.
window_size	Size of the context and target nodes for node2vec.
return_weight	Weight on the probability of returning to the same node the walk just came from Having this higher tends the walks to be more like a Breadth-First Search. Having this very high (> 2) makes search very local. Equal to the inverse of p in the Node2Vec paper.
explore_weight	Weight on the probability of visiting a neighbor node to the one we're coming from in the random walk Having this higher tends the walks to be more like a Depth-First Search. Having this very high makes search more outward. Having this very low makes search very local. Equal to the inverse of q in the Node2Vec paper.
iterations	Number of iterations.
classifiers	Classifier details.
classifier_id	Key to identify the classifier and associated parameters.
classifier_name	Name of the classifier.
classifier_type	Type of classifier.
edge_method	Edge method.
outfile	Fie path for saving output.
parameters	Parameters to be passed for building classifier.
sklearn_params	Parameters specific to sklearn.
tf_keras_params	Parameters specific to Tensorflow/Keras
random_state	Random seed.
max_iter	Maximum iterations.
layers_config	Configuration for instantiating layers for neural networks.
loss	Loss function.
metrics_config	Metrics to be calculated after classifier training.
optimizer	Optimizer function to be used during classifier training.
fit_config	Configuration for model fitting.
layers	Container of layers to be used to build the neural network.
type	Type of layer.
units	None
activation	Activation layer type
rate	None
metrics	Metrics need to train a classifier.
name	None
curve	Area under curve (AUC) to be calculated.
epochs	Number of epochs to run for training.
callbacks_list	Container of callbacks.
callbacks	Callbacks.
monitor	Quantity to be monitored.
patience	Number of epochs with no improvement after which training will be stopped.
min_delta	Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
verbose	Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays messages when the callback takes an action.
mode	One of {"auto", "min", "max"}. In min mode, training will stop when the quantity monitored has stopped decreasing; in "max" mode it will stop when the quantity monitored has stopped increasing; in "auto" mode, the direction is automatically inferred from the name of the monitored quantity.
models	Models that need to be used for link prediction.
model_id	Key of the model to be used.
node_types	Type of nodes.
cutoff	Cutoff filter.
source	Source node(s).
destination	Destination node(s).
s3_bucket	Bucket name.
s3_bucket_dir	Bucket path.
extra_args	Extra keyword arguments (**kwargs).
target_path	File path for saving results.
node_type_path	The path to the file with the unique node type names.
node_type_list_separator	The separator to use for the node types file.
node_types_column_number	Node type column number.
node_types_column	Node type column name.
node_types_ids_column_number	Node type ID column number.
node_types_ids_column	Node type ID column name.
node_types_number	The number of the unique node types. This will be used in order to allocate the correct size for the data structure.
numeric_node_type_ids	Whether the node type names should be loaded as numeric values, i.e. casted from string to a numeric representation.
minimum_node_type_id	The minimum node type ID to be used when using numeric node type IDs.
node_type_list_header	Whether the node type file has an header.
node_type_list_support_balanced_quotes	Whether to support balanced quotes.
node_type_list_rows_to_skip	The number of lines to skip in the node types file. The header is already skipped if it has been specified that the file has an header.
node_type_list_is_correct	Whether the node types file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
node_type_list_max_rows_number	The maximum number of lines to be loaded from the node types file.
node_type_list_comment_symbol	The comment symbol to skip lines in the node types file. Lines starting with this symbol will be skipped.
load_node_type_list_in_parallel	Whether to load the node type list in parallel. Note that when loading in parallel, the internal order of the node type IDs may result changed across different iterations. We are working to get this to be stable.
node_path	The path to the file with the unique node names.
node_list_separator	The separator to use for the nodes file. Note that if this is not provided, one will be automatically detected among the following - comma, semi-column, tab and space.
node_list_header	Whether the nodes file has an header.
node_list_support_balanced_quotes	Whether to support balanced quotes.
node_list_rows_to_skip	Number of rows to skip in the node list file.
node_list_is_correct	Whether the nodes file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
node_list_max_rows_number	The maximum number of lines to be loaded from the nodes file.
node_list_comment_symbol	The comment symbol to skip lines in the nodes file. Lines starting with this symbol will be skipped.
default_node_type	The node type to be used when the node type for a given node in the node file is None.
nodes_column_number	The number of the column of the node file from where to load the node names.
nodes_column	The name of the column of the node file from where to load the node names.
node_types_separator	The node types separator.
node_list_node_types_column_number	The number of the column of the node file from where to load the node types.
node_list_node_types_column	The name of the column of the node file from where to load the node types.
node_ids_column	The name of the column of the node file from where to load the node IDs.
node_ids_column_number	The number of the column of the node file from where to load the node IDs
nodes_number	The expected number of nodes. Note that this must be the EXACT number of nodes in the graph.
minimum_node_id	The minimum node ID to be used, when loading the node IDs as numerical.
numeric_node_ids	Whether to load the numeric node IDs as numeric.
node_list_numeric_node_type_ids	Whether to load the node types IDs in the node file to be numeric.
skip_node_types_if_unavailable	Whether to skip the node types without raising an error if these are unavailable.
load_node_list_in_parallel	Whether to load the node list in parallel. When loading in parallel, without node IDs, the nodes may not be loaded in a deterministic order.
edge_type_path	The path to the file with the unique edge type names.
edge_types_column_number	The number of the column of the edge types file from where to load the edge types.
edge_types_column	The name of the column of the edge types file from where to load the edge types.
edge_types_number	The number of the unique edge types. This will be used in order to allocate the correct size for the data structure.
numeric_edge_type_ids	Whether the edge type names should be loaded as numeric values, i.e. casted from string to a numeric representation.
minimum_edge_type_id	The minimum edge type ID to be used when using numeric edge type IDs.
edge_type_list_separator	The separator to use for the edge type list. Note that, if None is provided, one will be attempted to be detected automatically between ';', ',', tab or space.
edge_type_list_header	Whether the edge type file has an header.
edge_type_list_support_balanced_quotes	Whether to support balanced quotes while reading the edge type list.
edge_type_list_rows_to_skip	Number of rows to skip in the edge type list file.
edge_type_list_is_correct	Whether the edge types file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
edge_type_list_max_rows_number	The maximum number of lines to be loaded from the edge types file.
edge_type_list_comment_symbol	The comment symbol to skip lines in the edge types file. Lines starting with this symbol will be skipped.
load_edge_type_list_in_parallel	Whether to load the edge type list in parallel. When loading in parallel, without edge type IDs, the edge types may not be loaded in a deterministic order.
edge_path	The path to the file with the edge list.
edge_list_separator	The separator to use for the edge list. Note that, if None is provided, one will be attempted to be detected automatically between ';', ',', tab or space.
edge_list_header	Whether the edges file has an header.
edge_list_support_balanced_quotes	Whether to support balanced quotes while reading the edge list.
edge_list_rows_to_skip	Number of rows to skip in the edge list file.
sources_column_number	The number of the column of the edges file from where to load the source nodes.
sources_column	The name of the column of the edges file from where to load the source nodes.
destinations_column_number	The number of the column of the edges file from where to load the destinaton nodes.
destinations_column	The name of the column of the edges file from where to load the destinaton nodes.
edge_list_edge_types_column_number	The number of the column of the edges file from where to load the edge types.
edge_list_edge_types_column	The name of the column of the edges file from where to load the edge types.
default_edge_type	The edge type to be used when the edge type for a given edge in the edge file is None.
weights_column_number	The number of the column of the edges file from where to load the edge weights.
weights_column	The name of the column of the edges file from where to load the edge weights.
default_weight	The edge weight to be used when the edge weight for a given edge in the edge file is None.
edge_ids_column	The name of the column of the edges file from where to load the edge IDs.
edge_ids_column_number	The number of the column of the edges file from where to load the edge IDs.
edge_list_numeric_edge_type_ids	Whether to load the edge type IDs as numeric from the edge list.
edge_list_numeric_node_ids	Whether to load the edge node IDs as numeric from the edge list.
skip_weights_if_unavailable	Whether to skip the weights without raising an error if these are unavailable.
skip_edge_types_if_unavailable	Whether to skip the edge types without raising an error if these are unavailable.
edge_list_is_complete	Whether to consider the edge list as complete, i.e. the edges are presented in both directions when loading an undirected graph.
edge_list_may_contain_duplicates	Whether the edge list may contain duplicates. If the edge list surely DOES NOT contain duplicates, a validation step may be skipped. By default, it is assumed that the edge list may contain duplicates.
edge_list_is_sorted	Whether the edge list is sorted. Note that a sorted edge list has the minimal memory peak, but requires the nodes number and the edges number.
edge_list_is_correct	Whether the edges file can be assumed to be correct, i.e. does not have something wrong in it. If this parameter is passed as true on a malformed file, the constructor will crash.
edge_list_max_rows_number	The maximum number of lines to be loaded from the edges file.
edge_list_comment_symbol	The comment symbol to skip lines in the edges file. Lines starting with this symbol will be skipped.
edges_number	The expected number of edges. Note that this must be the EXACT number of edges in the graph.
load_edge_list_in_parallel	Whether to load the edge list in parallel. Note that, if the edge IDs indices are not given, it is NOT possible to load a sorted edge list. Similarly, when loading in parallel, without edge IDs, the edges may not be loaded in a deterministic order.
may_have_singletons	Whether the graph may be expected to have singleton nodes. If it is said that it surely DOES NOT have any, it will allow for some speedups and lower mempry peaks.
may_have_singleton_with_selfloops	Whether the graph may be expected to have singleton nodes with selfloops. If it is said that it surely DOES NOT have any, it will allow for some speedups and lower mempry peaks.
directed	Whether to load the graph as directed or undirected.

Enums

Enums	Description
NodeEmbedMethodEnum	Enums containing possible values for node embedding methods.
EdgeMethodEnum	Enums containing possible values for node edge methods.
ActivationEnum	Enums containing possible values for activation functions.
OptimizerEnum	Optimizers that can be implemented in the neural network.
ClassifierCallbackModeEnum	Callback modes while fitting a classifier.