Loading market data...
ai

OpenAI Introduces MRC (Multipath Reliable Connection): A New Open Networking Protocol for Large-Scale AI Supercomputer Training Clusters

MarkTechPost1 min read
Read Full Article at MarkTechPost
Share:PostShare
Ad Slot — In-Article (728x90)

MRC (Multipath Reliable Connection) is a new open networking protocol developed by OpenAI in partnership with AMD, Broadcom, Intel, Microsoft, and NVIDIA that improves GPU networking performance and resilience in large-scale AI training clusters by spreading packets across hundreds of paths simultaneously, recovering from network failures in microseconds, and enabling supercomputers with over 100,000 GPUs to be built using only two tiers of Ethernet switches.

MRC (Multipath Reliable Connection) is a new open networking protocol developed by OpenAI in partnership with AMD, Broadcom, Intel, Microsoft, and NVIDIA that improves GPU networking performance and resilience in large-scale AI training clusters by spreading packets across hundreds of paths simultaneously, recovering from network failures in microseconds, and enabling supercomputers with over 100,000 GPUs to be built using only two tiers of Ethernet switches.

This is a summary. For the full story, read the original article at MarkTechPost.

Original source: MarkTechPost

Ad Slot — Below Article (300x250)