+ indep. WoS citations

Python and Networks // Homework 2017-02-14 // Range of mean and median

Problem
Generate \(n_s=10\) groups (samples) of \(N\) random numbers (with identical, independent, uniform distribution) for the sample sizes \(N=10,30,100,300,\dots\). In other words, for each \(N\), generate \(n_s\) groups such that each group contains \(N\) random numbers. For each sample compute the mean and the median. For a given sample size, \(N\), the range of the medians is the highest median (of the \(n_s\) median values) minus the lowest median. For each \(N\) compute the range of the means and the range of the medians, and plot them together as a function of \(N\) with log-log axes. Based on this plot, what do you think could be the decay rates of the two curves as a function of \(N\)?

Solution (example)

1.  Python code (avg-med.py)

# Compute and print the width of the range of the average and the median
# of N random numbers. Use "ns" samples for each sample size.
import random

# Number of samples for a given sample size
ns = 10

# Output header. Append an additional newline to separate the header from the data.
print("# Comparing mean and median of N random numbers with %d samples for each N" % ns)
print("# N\n#\tWidth of the range of means\n#\t\tWidth of the range of medians\n")

# Loop through the list of sample sizes (N)
for N in (10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000, 300000, 1000000):

    # Declare the list of averages and the list of medians
    avgs = []; meds = []

    # Sample "ns" times (ns: number of samples)
    for _ in range(ns):

        # Generate N random numbers, each taken uniformly from the [0,1) interval
        rnd_nums = [random.random() for _ in range(N)]

        # Compute and save the average for the current list of random numbers
        avgs.append( 1.0 * sum(rnd_nums) / N )

        # Same for the median
        meds.append( sorted(rnd_nums,key=float)[ N // 2 ] )

    # Sort both lists
    avgs.sort(key=float); meds.sort(key=float)

    # Print the width of the range of means, Same for the medians
    print("%d\t%.2g\t%.2g" % (N, avgs[-1]-avgs[0], meds[-1]-meds[0]) )

2.  How to run the python code

python3 avg-med.py > avg-med.txt

3.  Output file (avg-med.txt)

# Comparing mean and median of N random numbers with 10 samples for each N
# N
#       Width of the range of means
#               Width of the range of medians

10      0.36    0.54
30      0.17    0.21
100     0.1     0.13
300     0.061   0.11
1000    0.021   0.046
3000    0.015   0.029
10000   0.0087  0.018
30000   0.0033  0.0096
100000  0.002   0.0029
300000  0.0019  0.003
1000000 0.0011  0.0016

4.  Gnuplot file

se term post col enh "Helvetica-Bold,20"
se o "avg-med.ps"
se log xy
se key bottom left
se xlab "N: sample size"
se lab "Range width of the mean and median\nof N uniform [0,1) rnd numbers\nwith 10 samples for each size" at scr 0.65,0.9 center
se ylab "Range width"
se ytic 10 
se xtic ("100" 100, "10^4" 1e+4, "10^6" 1e+6) 

p [5:2e+6][5e-4:1] \
\
'avg-med.txt' u 1:2 ti "Mean"   w p ps 2 pt 1 lw 4 lt 1, \
''            u 1:3 ti "Median" w p ps 2 pt 6 lw 4 lt 3

# converting ps to png
# convert -rotate 90 -geometry 500 -sharpen 5 avg-med.ps avg-med.png

5.  Output image: width of mean's and median's range