I just tried to try out parallel programming with python ,
Before going into the topic lets see why i started choosing python instead of c++ or c which executes faster.
The Reasons are simple ,
1.Python is a dynamic language (Less Code=>facilitates RAD).
2.Extensive libraries i.e SciPy , Matplotlib and even RAD gui tools like PyQt etc .Thus gives us a whole new world of opportunities to develop rich application that has extensive science application and use.
3. Portability , bindings with majority of other languages likes c++ etc.
So lets start with parallel programming with python ,
When ever we though of parallel programming we think of threads which are light weight that are most commonly parallel computing modules , but there is a slight disadvantage of using threads in python
because,
1.Only one thread is run at a time by the python interpreter i.e (cpython alone) . This is called GIL(Global interpreter lock). This prevents your threads spanning across multicore or multiprocessor architecture so even if your computer has 8 cores , your multi threaded python code runs only on 1 core , so this is a serious disadvantage .
So is there is a alternate solution ?Yes
We should change our implementation from multi threading to multi processing .
There exist a library called multiprocessing in python which helps to accomplish this ,
it is very easy which allows us to run any python function as a separate process . Thus in case instead of threads separate process will be created , this process creation process is a costly process i.e consumes more memory and time to create a process rather than thread , so you should be careful when using multiprocessing .
Experiment 1:
Program: Calculation of sum of 1 to 100000000(Seems little big calculation)
Linear approach:
import time
print "In linear"
start = time.time()
k = range(1,100000000)
m = sum(k)
print m
print "it took", time.time() - start, "seconds."
Parallel approach using multiprocessing(1Gb RAM , Core 2 Duo):
from multiprocessing import Process, Queue
import time
def do_sum(q,m,l):
q.put(sum(range(m,l)))
def main():
find = 100000000
list1=[]
q = Queue()
k = find / 10
for i in range(10):
j = i*k
p1 = Process(target=do_sum, args=(q,j,j+k))
p1.start()
list1.append(p1)
k=0
for i in range(10):
k = k + q.get()
print k
if __name__=='__main__':
print "In Parallel"
start = time.time()
main()
print "it took", time.time() - start, "seconds."
BenchMarks:
O/p on Linear Approach:
In linear
4999999950000000
it took 411.590999842 seconds.
O/P on Parallel Approach:
In Parallel
4999999950000000
it took 195.187999964 seconds.
tada! Reduced the execution time by 1/2 using parallel programming .
Note :Dont use multiprocessing for smaller computation , since it may cause excess overload , so think think twice before using multiprocessing.