Tuesday, November 29, 2011

Experiment with Python Parallel Computing

I just tried to try out parallel programming with python ,

Before going into the topic lets see why i started choosing python instead of c++ or c which executes faster.
The Reasons are simple , 

1.Python is a dynamic language (Less Code=>facilitates RAD).
2.Extensive libraries i.e SciPy , Matplotlib and even RAD gui tools like PyQt etc .Thus gives us a whole new world of opportunities to develop rich application that has extensive science application and use.
3.  Portability , bindings with majority of other languages likes c++ etc.

So lets start with parallel programming with python ,
When ever we though of parallel programming we think of threads which are light weight that are most commonly parallel computing modules , but there is a slight disadvantage of using threads in python
because,
1.Only one thread is run at a time by the python interpreter i.e (cpython alone) . This is called GIL(Global interpreter lock). This prevents your threads spanning across multicore or multiprocessor architecture so even if your computer has 8 cores , your multi threaded python code runs only on 1 core , so this is a serious disadvantage .

So is there is a alternate solution ?Yes

We should change our implementation from multi threading to multi processing .
There exist a library called multiprocessing in python which helps to accomplish this  ,
it is very easy which allows us to run any python function as a separate process . Thus in case instead of threads separate process will be created , this process creation process is a costly process i.e consumes more memory and time to create a  process rather than thread , so you should be careful when using multiprocessing .

Experiment 1:
Program: Calculation of sum of 1 to 100000000(Seems little big calculation)

Linear approach:
import time
print "In linear"
start = time.time()

k = range(1,100000000)
m = sum(k)
print m

print "it took", time.time() - start, "seconds."

Parallel approach using multiprocessing(1Gb RAM , Core 2 Duo):
from multiprocessing import Process, Queue
import time
def do_sum(q,m,l):
    q.put(sum(range(m,l)))

def main():
    find = 100000000
    list1=[]
    q = Queue()
    k = find / 10
    for i in range(10):
        j = i*k
        p1 = Process(target=do_sum, args=(q,j,j+k))
        p1.start()
        list1.append(p1)
    k=0
    for i in range(10):
        k = k + q.get()
    print k

if __name__=='__main__':
    print "In Parallel"
    start = time.time()
    main()
    print "it took", time.time() - start, "seconds."
BenchMarks:

O/p on Linear Approach:

In linear
4999999950000000
it took 411.590999842 seconds.
O/P on Parallel Approach:

In Parallel
4999999950000000
it took 195.187999964 seconds.
tada! Reduced the execution time by 1/2 using parallel programming .
Note :Dont use multiprocessing for smaller computation , since it may cause excess overload , so think think twice before using multiprocessing.