How to determine enough sample size? Is it possible to find similarity score between two distributions?

Question

9.83K viewsJanuary 2, 2020Journal Selection

0

Anonymous32 January 8, 2019 0 Comments

How to determine enough sample size? Is it possible to find similarity score between two distributions? If yes, please elaborate.

Distribution similarity example: Suppose we want to find a similarity between two documents using their words frequency, then, one way to do that is by using the frequencies of each word to build a histogram for each document. Finally, we want to find which distribution (histogram) is similar to what. What will be the best solution of tool?

Add a Comment

1 Answer

You are viewing 1 out of 1 answers, click here to view all answers.

Write your answer.

score 0 · Answer 1 · 2019-01-08T07:05:25+00:00

Researchers use power analysis for determining sample size. Power analysis allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence. Conversely, it also helps us to determine the probability of detecting an effect of a given size with a given level of confidence, under sample size constraints. According to the probability determined, researchers would alter or abandon the experiment. The power analysis can be done through the following five steps:

Determine a hypothesis test
Determine the significance level of the hypothesis test
Determine the smallest sample size that is of scientific interest
Estimate the value of other parameters required such as the mean and SD. This often requires a pilot study from which mean and SD are calculated.
Determine the intended power of the test

Box plots are useful in determining the similarities as well as differences within the two distributions. Often box plots are solely used for the comparative study of distributions. However, one has to be aware of how to compare the two box plots. The boxes represent the interquartile range or the middle half of the values in each group. If two such boxes do not overlap with each other, then there is a difference between the two groups. This is followed by analyzing the median lines, whiskers (if any) and outliners ranging out from the boxes. Accordingly the similarities or differences can be computed within distributions.

How to determine enough sample size? Is it possible to find similarity score between two distributions?

1 Answer

FORMAL TONE

FORMAL TONE

ADVANCED GRAMMAR

ADVANCED GRAMMAR

STYLE GUIDE

STYLE GUIDE

Duncan Nicholas

James Wicker

Wei Kong

Reporting Research

Publishing Research

Important Links

OUR SERVICES

About Us

GLOBAL ENAGO