Sparkflows provides you with a wide range of ML algorithms to train your ML model.
* Read your dataset using the appropriate read nodes, supporting formats like CSV, TXT, TSV, or others.
* Perform necessary data cleaning tasks on the dataset. Use String Indexer if you want to use a categorical string column as a feature to your model.
* If you are using Spark MLlib algorithms, use a Vector Assembler Node to assemble features. This step is not required for H2O algorithm nodes.
* Perform a train-test split on the dataset/assembled features.
* Connect the splitted training data to the ML algorithm node, and connect the test dataset to the Predict/Score Node.
* Ensure the Algorithm node is connected to the Predict/Score node to pass the fitted model for transforming the test dataset.
* Include a save node to save your model. (For Spark ML, use the Spark ML Model save node, and for H2O algorithm nodes, use the H2O ML Model save node.) This allows loading the model and applying transformations on new, unseen datasets later.
* Consider adding an evaluator node to the Predict/Score node to evaluate the test metrics.
* Execute the workflow to train your model.
* To view the model metrics, go to the Models tab, click on the model UUID, and access results such as Train metrics, Test Metrics, Feature importance, and more.
Hello Ragita, please find the answer below.
Sparkflows provides you with a wide range of ML algorithms to train your ML model.
* Read your dataset using the appropriate read nodes, supporting formats like CSV, TXT, TSV, or others.
* Perform necessary data cleaning tasks on the dataset. Use String Indexer if you want to use a categorical string column as a feature to your model.
* If you are using Spark MLlib algorithms, use a Vector Assembler Node to assemble features. This step is not required for H2O algorithm nodes.
* Perform a train-test split on the dataset/assembled features.
* Connect the splitted training data to the ML algorithm node, and connect the test dataset to the Predict/Score Node.
* Ensure the Algorithm node is connected to the Predict/Score node to pass the fitted model for transforming the test dataset.
* Include a save node to save your model. (For Spark ML, use the Spark ML Model save node, and for H2O algorithm nodes, use the H2O ML Model save node.) This allows loading the model and applying transformations on new, unseen datasets later.
* Consider adding an evaluator node to the Predict/Score node to evaluate the test metrics.
* Execute the workflow to train your model.
* To view the model metrics, go to the Models tab, click on the model UUID, and access results such as Train metrics, Test Metrics, Feature importance, and more.