Parallel Download Optimization

optimization
maths
Published

November 15, 2020

Motivation

We want to download a file of size s. We can download it parallely in away, that we split the file into many pieces and download those pieces separatelly at the same time. Unfortunately, the conditions have changed and we can start a new downloading thread only after m seconds. We also have an internet with a limited speed vl. What would be the n ideal number of threads to use? The second question is, given nnumber of threads, what would be the ideal length of parts downloaded parts s1,...,sn?

Derivation

The main idea is that the downloading is fastest when the most threads possible are running in parallel. Therefore our new method will calculate the threads sizes in a following way: - We start a new thread as soon as possible (after m seconds), - all threads stop running at the same time

This is the optimum way, since there will be running as many threads as possible for as long time as possible.

Denote ti as the time needed to download part si with speed v. Therefore ti=siv and

s=s1+...+sn s=v(t1+...+tn)

We have a limit for the speed: n threads together must not have greater speed than vl.

inv=nvvlnvlv

Since we start a new thread after m seconds and stop all threads at the same time, then ti=ti1m. Expanding a few members of the recurrent sequence, we can observe that

ti=t1m(i1)

We can calculate the total amount of time is the time until the last thread stops downloading. The first thread is downloading for t1 seconds. the second for t2 and waits m seconds before downloading, therefore the total waiting for the second thread to finish up is t2+m seconds. We can expand this up to n.

Thus the total amount of time of the download is

t=t1=t2+m=...=tn+m(n1)

The best way is to split it into as many pieces as possible while i:ti0 holds. Since ti<ti1, then the constraint tn0 is the most restricting one. This allows us to ignore the other, less-restricting constraints.

tn=t1m(n1)0t1m(n1)

s=v(t1+...+tn)=vinti s=vin(t1m(i1)) s=v(nt1min(i1)) s=v(nt1m2n(n1))

Using the inequality t1m(n1), we can furthermore derive:

sv(nm(n1)m2n(n1))=v(m2n(n1)) 2smvn(n1) n2n2smv0 n1,2=1±14(1)2smv2=1±1+8smv20

There must also hold that nvlv. We want to choose nN that has the biggest value. With the previously stated bounds in mind, we can calculate the result number of threads as

n=floor(min(1+1+8smv2,vlv))

Let’s answer the second question: having n, what would be the ideal splits s1,...,sn. Using

t1=svn+m2(n1)

we can calculate s1=vt1. Since we can calculate it any ti using

ti=t1m(i1)

, we can calculate si for all i.

Example

Suppose we want to calculate optimal number of threads and later corresponding splits for the following parameters.

s=30MB=30000kB v=100kB/s vl=20MB/s=20000kB/s m=60s

What is the best n to choose?

1+1+830000601002=3.7 20000100=200

Therefore the final optimal number of threads to choose n=3.

Using this n we can calculate size of the splits.

t1=300001003+602(31)=160s

s1=vt1=100160=16000kB=16MB

t2=t1m=16060=100s

s2=vt2=100100=10000kB=10MB

t3=t2m=10060=40s

s3=vt3=10040=4000kB=4MB

We can at last check, that s=s1+s2+s3, which we can see that it indeed holds.

The best way to split the files is to let the first thread download 16 MB, the second 10 MB and the last thread 4 MB for a total time of 160 s.

Comparison to a naive method

Let’s compare it to a naive method of splitting. The naive method splits the file into n same parts. Therefore

s1=...=snt1=...=tn

and the total length s is

s=insi=ns1

The total time spent by downloading t is calculated as the time of the initial thread t1 + m delay for starting every other thread.

tnaive=t1naive+m(n1)

tnaive can be calculated as

t1naive=s1v=snv

and thus

tnaive=snv+m(n1)

Our method had

tour=t1our=snv+m2(n1)

If we compare them, then

Δt=tnaivetour=m2(n1)

So if we would start the downloading using both methods at the same time using the same number of threads, then our method would end m2(n1) seconds before the naive method.

Back to top