[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*Subject*: Re: frequency of differences of sorted random numbers*From*: Craig Markwardt <craigm(at)cow.physics.wisc.edu>*Date*: 29 Oct 1996 12:19:37 -0600*Distribution*: inet*Newsgroups*: sci.math.num-analysis*Organization*: U. Wisc. Madison Physics -- Compact Objects*References*: <553bu7$a69@zappa.northnet.org> <32762CAB.6068@ford.com>

"Robert N. Lambeck" <rlambeck@ford.com> writes: > S.A.Weil wrote: > : > : I encountered the following empirical result: > : > : First I chose random integers from a uniform distribution > : (trusting Turbo Pascal) such that the sample size (say 100000) is > : much less than the range (say 0 to 2000000). Then I sorted those > : integers and found the frequencies, f[d], of the differences, d, > : between successive numbers. (I have a good reason for doing this > : but it doesn't relate to the immediate problem.) A very good > : fit was made with a linear semi-log plot. That is > : > : ln ( f[d] ) = a -- b * d > : > : Could I have anticipated this result from the properties of the > : uniform distribution ? If so, any ideas how? > : > : Sandy > > Unfortunately, I don't have a reference at hand; however, I believe > your example is analogous to the case of predicting the waiting > interval between successive phone calls arriving at a switchboard. > Individual phone calls are randomly distributed (with uniform > distribution) over any time interval. The waiting period between > successive phone calls follows the exponential distribution. > A process which produces "events" with a constant probability per unit time (or space), such as the switchboard and uniform distribution examples given above, are essentially Poisson processes. Consider a Geiger counter, which has an expected rate of r over an observation time of t. Then the probability distribution for the number of events actually observed, n, is the Poisson distribution: P_n = (m^n)/n! exp(-m) where m = r t is the expected number of events over time t (different than the observed number of events!). In the case the original poster was asking about: the distribution of *waiting* times between events is given by P_0. That is, the probability that *no* event occurs in time t. P_0 = exp(-m) = exp(-r t), or, normalized in "t" space, P_0 dt = r exp(-r t) dt Which is normalized to 1 if you integrate over dt. Multiply by N, the total number of samples taken, if you want to compare with your frequency histogram. If you take the (natural) log of the distribution, then you obtain, ln (P_0 dt) = ln(r N dt) - r * t So your linear regression is definitely expected based on first priniciples. The value of r can be estimated as the total number of samples divided by the total range of the samples, or r=N/T. The value of dt is your frequency histogram binsize. Thus, I expect that your values of the fitted parameters, "a" and "b" as described above, should be "a" = ln(N^2 dt / T) ; N is total number of samples, dt is binsize, and T is the total range of samples. "b" = N/T This derivation has certainly worked for me in the past! Good luck, Craig -- --------------------------------------------------------------------- Craig Markwardt UW-Madison 608 262 7555 | "To cogitate and internet: craigm@astrog.physics.wisc.edu | to solve ..." -MathNet ---------------------------------------------------------------------

- Prev by Date:
**ANNOUNCE: perlDL v1.00 - the 'perl Data Language'** - Next by Date:
**Re: gcc and idl** - Prev by thread:
**ANNOUNCE: perlDL v1.00 - the 'perl Data Language'** - Next by thread:
**Re: gcc and idl** - Index(es):