In Sparkflows, we can use the ‘Sample’ processor to extract a sample of incoming datasets. The number of rows in the sample would be a percentage of the incoming dataset. Sample can be used for ML Training or Analysis purposes.
To use the ‘Sample’ Processor:
Select ‘True’/’False’ for ‘With Replacement’. Setting it to ‘True’ would result in selected data being added back to the population so that they can be picked again. Setting to ‘False’ removes a record from the selection once selected.
Enter a value for ‘Fraction’. It should be less than 1. It would determine the size of the sample created.
Set a ‘Seed’ value. It would help to reproduce the selected sample.
For more information, read the Sparkflows Documentation here:
Hey Chris,
In Sparkflows, we can use the ‘Sample’ processor to extract a sample of incoming datasets. The number of rows in the sample would be a percentage of the incoming dataset. Sample can be used for ML Training or Analysis purposes.
To use the ‘Sample’ Processor:
Select ‘True’/’False’ for ‘With Replacement’. Setting it to ‘True’ would result in selected data being added back to the population so that they can be picked again. Setting to ‘False’ removes a record from the selection once selected.
Enter a value for ‘Fraction’. It should be less than 1. It would determine the size of the sample created.
Set a ‘Seed’ value. It would help to reproduce the selected sample.
For more information, read the Sparkflows Documentation here:
https://docs.sparkflows.io/en/latest/processors/03-Prepare/13-Others/sample.html?highlight=sample#sample