Given a huge set of words with duplicates present and a positive integer k, find the first k–maximum occurring words in it.

For example,

Input:
 
keys = [code, coder, coding, codable, codec, codecs, coded, codeless, codec, codecs, codependence, codex, codify, codependents, codes, code, coder, codesign, codec, codeveloper, codrive, codec, codecs, codiscovered]
 
k = 4
 
Output:
 
codec occurs 4 times
codecs occurs 3 times
code occurs 2 times
coder occurs 2 times

The idea is to use Trie (Prefix Tree) to solve this problem. Start by inserting each key into the Trie and store its count so far (along with the key itself) in the leaf nodes. After all keys are inserted into the trie, perform its preorder traversal (Depth–first search) and insert all key’s count into a max-heap. Then the problem is reduced to removing the first k elements from the max-heap.

The algorithm can be implemented as follows in C++, Java, and Python:

C++


Download  Run Code

Java


Download  Run Code

Python


Download  Run Code

Output:
 
codec occurs 4 times
codecs occurs 3 times
code occurs 2 times
coder occurs 2 times

The time complexity of the above solution is O(N.M), where N is the total number of given words and M is the maximum word length. The auxiliary space required by the program is O(N × M).

 
Related Post:

Find the maximum occurring word in a given set of strings