In Sparkflows, we can use the ‘Drop Duplicate Rows’ processor to drop duplicate rows having same data for the selected columns.
To Use the ‘Drop Duplicate Rows’ Processor:
- Select one or multiple columns in the ‘Columns’ list. Data in these selected columns would be checked to identify duplicates. Post deletion only one row would be passed to the outgoing DataFrame.
For more information read the Sparkflows Documentation here:
Hello Ragita,
In Sparkflows, we can use the ‘Drop Duplicate Rows’ processor to drop duplicate rows having same data for the selected columns.
To Use the ‘Drop Duplicate Rows’ Processor:
- Select one or multiple columns in the ‘Columns’ list. Data in these selected columns would be checked to identify duplicates. Post deletion only one row would be passed to the outgoing DataFrame.
For more information read the Sparkflows Documentation here:
https://docs.sparkflows.io/en/latest/user-guide/data-preparation/data-cleaning.html?highlight=drop%20duplicate%20rows#drop-duplicate-rows