Optimizer parallelism also known as zero redundancy optimizer [37] implements optimizer state partitioning, gradient partitioning, and parameter partitioning throughout units to lessen memory use while trying to keep the interaction expenses as reduced as is possible.Segment V highlights the configuration and parameters that play an important par