Sort words by frequency. Includes link for excluding common words in the English language.
I would say TF-IDF
TF stands for Term Frequency. Given an English term t and a document d, TF(t,d) represents how many t appears in d
DF stands for Document Frequency. Given an English term t and a list of documents D (d1~dn in D), DF(t,D) represents how many documents di are there such that t appears in di at least once
IDF is Inverted DF. Basically the bigger DF value, the smaller IDF value is
But sometimes, raw frequency is not what we want. A document with 10 appearances of a term is more relevant than another document with only one, but not 10 times as relevant. In this case, we use Log-Frequency Weighting
The more often a term appears, the more important it is