PubMed IDs and scores of the relevant training examples.
Started at
2007/11/21 13:15:34 GMT
Time at which query was started
Finished at
2007/11/21 13:17:46 GMT
Time at which this file was written.
Base score
-3.42387141436
The log likelihood ratio of an empty article (one in
which every feature failed to occur).
Prior score
-7.31995351833
The log of the prior probability ratio for
an article being relevant versus irrelevant (added to log likelihood ratio
to obtain the final score). Equals the logit of the estimated
prevalence of relevant articles in Medline (which may be estimated
from the input size or specified separately).
Limit
5000
The maximum number of results to include.
Threshold
0
Default Naive Bayes classification threshold is
zero. This threshold is the minimum log probability ratio for
predicting an article to be relevant.
Feature Statistics
Quantity
Positives
Negatives
Number of documents
10727
16199205
Number of distinct features
5007
41313
Total feature occurrences
157153
219138628
Terms per document
14.650
13.528
Features with high TF.IDF
Features with TF.IDF above 0.2 or 0.3 could make good keywords. TF.IDF is term frequency times
inverse document frequency, where we treat the set of input citations as a
single document