[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: frequency of differences of sorted random numbers





"Robert N. Lambeck" <rlambeck@ford.com> writes:
> S.A.Weil wrote:
> : 
> : I encountered the following empirical result:
> : 
> : First I chose random integers from a uniform distribution
> : (trusting Turbo Pascal) such that the sample size (say 100000) is
> : much less than the range (say 0 to 2000000).  Then I sorted those
> : integers and found the frequencies, f[d], of the differences, d,
> : between successive numbers. (I have a good reason for doing this
> : but it doesn't relate to the immediate problem.)   A very good
> : fit was made with a linear semi-log plot.  That is
> : 
> :      ln ( f[d] ) = a -- b * d
> : 
> : Could I have anticipated this result from the properties of the
> : uniform distribution ? If so, any ideas how?
> : 
> : Sandy
> 
> Unfortunately, I don't have a reference at hand; however, I believe
> your example is analogous to the case of predicting the waiting
> interval between successive phone calls arriving at a switchboard.
> Individual phone calls are randomly distributed (with uniform
> distribution) over any time interval.  The waiting period between
> successive phone calls follows the exponential distribution.
> 

A process which produces "events" with a constant probability per unit
time (or space), such as the switchboard and uniform distribution
examples given above, are essentially Poisson processes.

Consider a Geiger counter, which has an expected rate of r over an
observation time of t.  Then the probability distribution for the
number of events actually observed, n, is the Poisson distribution:

P_n = (m^n)/n! exp(-m)    where m = r t is the expected number of events
                                over time t (different than the observed
                                number of events!).

In the case the original poster was asking about: the distribution of
*waiting* times between events is given by P_0.  That is, the
probability that *no* event occurs in time t.

P_0 = exp(-m) = exp(-r t), or, normalized in "t" space,

P_0 dt = r exp(-r t) dt

Which is normalized to 1 if you integrate over dt.  Multiply by N, the
total number of samples taken, if you want to compare with your
frequency histogram.  If you take the (natural) log of the
distribution, then you obtain,

ln (P_0 dt) = ln(r N dt) - r * t

So your linear regression is definitely expected based on first
priniciples.  The value of r can be estimated as the total number of
samples divided by the total range of the samples, or r=N/T.  The
value of dt is your frequency histogram binsize.

Thus, I expect that your values of the fitted parameters, "a" and "b"
as described above, should be

  "a" = ln(N^2 dt / T)  ; N is total number of samples, dt is binsize, and
                          T is the total range of samples.
  "b" = N/T

This derivation has certainly worked for me in the past!

Good luck,

Craig


-- 
---------------------------------------------------------------------
Craig Markwardt UW-Madison 608 262 7555    | "To cogitate and
internet: craigm@astrog.physics.wisc.edu   |  to solve ..." -MathNet
---------------------------------------------------------------------