Forum

Ignite Discussions : Ask Questions, Find Answers, Share Expertise about Sparkflows

To test this feature, visit your live site.

Jun 14, 2023

How can I obtain the probability of a prediction in a classification problem?

Suppose I possess a retail dataset and my objective is to predict the likelihood of customer churn. To accomplish this, I employed a random forest model with Spark prediction. However, the resulting data presents the probability column in a vectorudt format. How can I efficiently separate the probability column?

1 comment

Comments (1)

Lakshay

Jun 14, 2023

Hey Chris, To fulfill the aforementioned requirement, we can utilize a "Split Probability" node. This node is employed after the "Spark Predict" node and takes the probability column in array format as an input DataFrame. It performs the task of splitting the vectorudt into two separate columns. The first column, labelled as "prob0," captures the probability of a customer not churning (i.e., 0), while the second column, labelled as "prob1," captures the probability of a customer churning (i.e., 1). This enables an efficient segregation of the probability values for further analysis or decision-making.

Forum

How can I obtain the probability of a prediction in a classification problem?

© 2023 Sparkflows, Inc. All rights reserved.