Modular Configuration System (MCS)
MAGIST uses a custom configuration system that we call MCS. MCS is a modular method for managing data-flows and MAGIST subcomponents. For example, for the dataflow "images", we have the "MAGIST Vision" which contains the "DetectionDataManager", "FullySupervisedModels", and "SemiSupervisedModels". Although all subcomponents do not require configuration, having such categorization provides the modularity that makes MAGIST community-friendly.
If community members are interested in making custom modules using the base MAGIST API, they can readily do that through this configuration and a consistent module setup that will be discussed in later wiki pages.
Line-By-Line Sample Config Explanation
- This is a large container to manage all API related authentication and configuration. This can contain subsets by authentication method or API provider.
- This is primarily meant for system-level commands with elevated privileges. 3rd party modules will rarely use this subcomponent.
- This subcomponent manages all the tasks, threads, cores, and other system resources for faster and more distributed operation.
- These are global paths for MAGIST core system. 3rd parties will NOT use this subcomponent and will define paths in their own module as the "tf_lite_detector" module did.
- This component manages verbose and debug. 3rd parties will likely use this as a global verbose setting for their subcomponents.
- This is our first actual subcomponent. This is what 3rd parties will be creating and implementing into the existing MAGIST system. This can contain other elements related to their module like paths, modes, parameters, etc.
- This is another example of a subcomponent of MAGIST that is implemented into the configuration.
- This comment was specifically added since this functionality is only available for MAGIST with MongoDB. Newer versions deprecated MongoDB and instead use ElasticSearch.
- This comment was specifically added since this functionality is only available for MAGIST with MongoDB. Newer versions deprecated MongoDB and instead use ElasticSearch.
- This comment was specifically added since this functionality is only available for MAGIST with MongoDB. Newer versions deprecated MongoDB and instead use ElasticSearch.
Note
Please note that this config.json
is a sample and the exact parameters will vary based on the environment.
Other Important Configuration Files
The file explored in the previous section is the config.json
file under the src/config
folder on our repository. There are other files that pertain to this project that must also be explored:
Note
schema_nested.json
is an experimental file that used ElasticSearch nested databases. This file should be ignored. This wiki will only cover schema.json
and queries.json
since they are relevant to the project.
Queries Configuration
After the deprecation of MongoDB, MAGIST uses ElasticSearch to perform data storage, management, and querying. This is a drastic change from the first stable release (v0.1.0) which used a completely custom nested querying method. Due to the high-performance search functions the ElasticSearch API exposes, MAGIST transitioned to them. ElasticSearch entirely runs on JSON queries which is why we created separate configuration files for them.
- Query to check if an object exists given a keyword. This does a more narrow search as opposed to the full database search that the following query does.
- Sets the fields to search through to "name" which is just the object name.
- Runs a keyword search which is meant for names.
- Query to do a complete database search for objects by any datapoint: name, description, location, usage, etc.
- Sets the field to wildcard.
- Uses a custom analyzer to effectively search the database for information.
- Functions identically to
object_exists
, but for words. - Functions identically to
object_full
, but for words.
These queries can be modified to suit user preferences.
Warning
MAGIST will simply use the JSON dictionary contained in the categories (object_exists
, object_full
, word_exists
, word_full
) EXACTLY as they are written here. This means that these dictionaries are fed DIRECTLY into ElasticSearch. Please use ElasticSearch configuration style, keywords, and arguments when modifying.
Data Schema Configuration
The schema.json
file is responsible for defining search parameters, ElasticSearch settings, and the database schema. The schema will structure the data in a logical, easy to search format that can easily be expanded. Adding new modules or datatypes will require extensive modification of this schema.
- The container for the object database schema.
- This is the ElasticSearch database settings that defines the analyzer and search settings.
- The analyzer is a set of algorithms that define the exact semantics on how a word/sentence/paragraph is to be interpreted, searched, and processed.
- This defines the custom analyzer that MAGIST uses.
- The analyzer uses the
fingerprint
algorithm will "lowercase, normalize to remove extended characters, sort, deduplicate and concatenate into a single token. If a stop-word list is configured, stop words will also be removed"1. - Defines the dictionary to use when deleting stop-words like "is", "the", "an", etc.
- This defines the similarity algorithm ElasticSearch will used when evaluating the similarity of the query to the data sample.
- This algorithm will "attempt to capture important patterns in the text, while leaving out noise"2.
- The
lambda
parameter is one of the requirements for the algorithm to function. More about LM Jelinek Mercer lambda parameter - This defines the database schema (a.k.a mapping) that the database will use to store each document (this refers to a single datapoint with several attributes).
- In this case,
properties
refers to the attributes of the object itself and are not ElasticSearch. - Name of the objects
- Description of the object scraped from online sources.
- People who have been recognized to be using that object.
- Other objects that relate to this object.
- Where the object was found.
- Same database schema container as the
object_db_schema
but for words. - The word that is being stored.
- The definition of the word.
- People who were found to use that word.
- Other words that relate to this one.
- Objects that relate to the word.
- Where the word was used.
Warning
MAGIST will simply use the JSON dictionary contained in the categories (object_db_schema
, word_db_schema
) EXACTLY as they are written here. This means that these dictionaries are fed DIRECTLY into ElasticSearch. Please use ElasticSearch configuration style, keywords, and arguments when modifying.
-
This quote is directly taken from the ElasticSearch documentation regarding the
fingerprint
algorithm. Link to the page ↩ -
This quote is directly taken from the ElasticSearch documentation regarding the
LM Jelinek Mercer
algorithm. More about LM Jelinek Mercer ↩