The external merge sort algorithm is used to efficiently sort massive amounts of data when the data being sorted cannot be fit into the main memory (usually RAM) and resides in the slower external memory (usually a HDD).

External merge sort uses a hybrid sort-merge technique. The chunks of data small enough to fit in the RAM are read, sorted, and written out to a temporary file during the sorting phase. In the merge phase, the sorted sub-files are combined into a single larger file.

 
In other words, external merge sort sorts the chunks of data that fits in the main memory, then merges the sorted chunks, i.e.,

  1. First, divide the file into runs such that the size of a run is small enough to fit into the main memory.
  2. Next, sort each run in main memory using the standard merge sort sorting algorithm.
  3. Finally, merge the resulting runs into successively bigger runs until the file is sorted.

Following is the C++ code to demonstrate the external merge sort algorithm:

 
Please note that this code doesn’t work on online compilers as it requires file creation permissions. When run locally, it will produce a sample input file “input.txt” with 10000 random numbers. It sorts the numbers and puts the sorted numbers in a file “output.txt.” It also generates files with names 1, 2, … to store sorted runs.

 
Author: Aditya Goel

 
References: https://en.wikipedia.org/wiki/External_sorting