Suppose I possess a retail dataset and my objective is to predict the likelihood of customer churn. To accomplish this, I employed a random forest model with Spark prediction. However, the resulting data presents the probability column in a vectorudt format. How can I efficiently separate the probability column?
top of page
bottom of page
Hey Chris, To fulfill the aforementioned requirement, we can utilize a "Split Probability" node. This node is employed after the "Spark Predict" node and takes the probability column in array format as an input DataFrame. It performs the task of splitting the vectorudt into two separate columns. The first column, labelled as "prob0," captures the probability of a customer not churning (i.e., 0), while the second column, labelled as "prob1," captures the probability of a customer churning (i.e., 1). This enables an efficient segregation of the probability values for further analysis or decision-making.