In Sparkflows, we can use the ‘Drop Null Rows for Selected Columns’ processor to drop duplicate rows having null values for the selected columns.
To use the ‘Drop Null Rows for Selected Columns’ Processor:
Select one or multiple columns in the ‘Columns To Check’ list. Data in these selected columns would be checked to null rows. If any of the selected columns is empty then row would be deleted from the outgoing DataFrame.
For more information read the Sparkflows Documentation here:
Hey Chris,
In Sparkflows, we can use the ‘Drop Null Rows for Selected Columns’ processor to drop duplicate rows having null values for the selected columns.
To use the ‘Drop Null Rows for Selected Columns’ Processor:
Select one or multiple columns in the ‘Columns To Check’ list. Data in these selected columns would be checked to null rows. If any of the selected columns is empty then row would be deleted from the outgoing DataFrame.
For more information read the Sparkflows Documentation here:
https://docs.sparkflows.io/en/latest/tutorials/data-engineering/drop-rows-with-null.html?highlight=Drop%20Null%20Rows%20for%20Selected%20Columns