Thursday, March 14, 2013

Concurrency Speed Up Factor : Amdahl's Law

Amdahl's Law or Amdahl's argument defines the Maximum improvement of a collective system when the parts of the system are improved.

In parallelisation or Concurrency domain, it defines the Maximum SpeedUp Factor when then task executed on a single processor is divided into tasks that can be executed by Multiple Processors.

What is Speedup ?

Speed up factor is the ration of  'Time of Execution on a single processor' Vs 'Time of exection on a processor when the task is distributed in parallel across N processors'

Lets assume a Task , task_overall , is executed on a single processor. Let the time of exection = T(task_overall)
Now the nature of the task permits it be divided N number of smaller tasks.
Let each smaller task = task(i) , where i <= n and i = 1, 2,..,N
In order of improve time execution , we decide to execute each task(i) on a separate processor

Since the Time of execution on a single process IS PROPORTIONAL TO amount of task executed.
Let theTime of Execution for a chunk of task, task(i) = T(i) , where task(i) = task(over_all)/N fraction of the complete task.
So the new fractional time on each processor now is T(i) = T(task_overall)/N
Sped Up factor =  T(task_overall)/T(i)
E.g :
Let T(task_overall) = 10 hours. i.e It takes 10 hours to do a task on a single processor.
Let's divide this task into 7 chunks and allow it to processed by 7 processors.
Fractional task executed at each processor now is 1/7
So reduced Time on each processor has changed from 10 hours to 10*(1/7) = 10/7
This ratio of the previous time to this reduced time is the Speed Up Factor.
Speed Up Factor = T(over_all) / T(i) = 10/ (10/7) = 7 
So speed increases seven times by dividing a task in parallel across 7 processors.

Speed up Factor when only a part of task can be parallelised

Let the Task, task_overall consists of  a fraction of task that cannot be parallelised. i.e this fraction of task can be executed only serially , hence no speed up can be gained on this fraction when the number of parallel processes is increased

Task( over_all ) = Task( serial ) + Task( task that can be parallelised )
Task( over_all ) = K percent of Task ( serial )   + ( 1 - K ) percent of Task ( parallel )
Since Time is proportional to task
T( over_all ) = K % T ( over_all )  + ( 1 - K ) % T ( over_all)
T(i) = K % T ( over_all )  + ( 1 - K ) % T ( over_all) / N

So Speed Up Factor = T( over_all ) / T ( i )
Speed Up Factor = 1 / [ f + (1-f)/N ]

Here f = K % T( over_all ) i.e f represents the fraction of task that cannot be parallelized

Amdahl's Law can be used to measure increase in execution time when chunks of serial + parallel tasks is executed in parallel across multiple processors.

This Law can also be used to measure the performance gains on concurrent executions by threads

Wednesday, March 13, 2013

Different Concurrency Models - Java Vs Others


  Concurrency Models in Programming Languages


 Two main paradigms or models of concurrency in programming languages today, one is the shared-state that utilize multiple threads of execution synchronized using locks. The other is the share-nothing model that uses asynchronous messaging passing across lightweight processes or threads. 
An important factor why concurrency models are becoming very popular is because of the increasing multi-core processors trend, processor manufacturers are increasing the numbers of cores in processors, giving access to processors with 4, 8 or more cores capable of doing multiple calculations in parallel. There has been an increased interest in software engineering techniques to fully utilize the capabilities offered by these processors 

Shared-state model

Java supports the shared-state concurrency model that relies on synchronizing concurrent threads via concurrency-related objects, such as locks, barriers, and latches. All concurrency abstractions in Java are based on the shared mutable state paradigm and prone to the vulnerabilities of race conditions, blocking/contention and deadlocks. Since Java 1.5 or 5 new packages and classes have been added to improve the concurrency model and Java 7 has introduced Fork/Join. Fork/Join allows programmers to utilize finer-grained concurrency by allowing independent task components to be parallelized through recursive techniques. The individual subtasks known as the Fork/Join subtasks are mapped to a pool of threads that initiates and manages their execution using advanced queuing and scheduling techniques.
Thread management in Java imposes a number of challenges. In applications that demand execution of large number of concurrent processes in the order of thousands, JVM’s threads  show to be slow because of stack maintenance overhead, locking contention, and context switch overhead.
 Apart from the expensive context switching and the heavy weight nature of Threads in Java, the the main problem is the shared-state model itself. As a model of computation, threads are non-deterministic, and with the shared-state of the Java memory model, when multiple threads can potentially access a piece of information in memory concurrently, the programming model becomes highly complex.

Share-nothing model

The other model, the shared-nothing model, supported by Erlang has been used extensively to achieve a high level of concurrency. Erlang has popularized concurrency oriented programming by implementing an extremely lightweight process model on top of the operating system and platform. The actor model in Erlang is based on strong isolation semantics, self-contained processes with asynchronous message passing.
On the JVM Scala provides the same model supported by Erlang, Scala brings actor based concurrency on the JVM that allows the design of scalable concurrent applications that automatically take advantage of the multicore processors.  Scala actors provide a better model for programming concurrently in the virtual machine environment. Share-nothing model and Scala can be proved to have advantages over the JVM.

Scala Actors

Scala Actors are based on the shared-nothing and message passing model. An Actor is a mathematical model of concurrent computation that encapsulate data, code and its own thread of control, and communicate asynchronously using immutable message passing techniques. When the basic architecture is shared-nothing, each actor seems to act in its own process space. And the success and scalability of the actor implementation depends a lot on the ability of the language to implement lightweight processes on top of the native threading model. Every actor has its own mailbox for storing messages, implemented as asynchronous, race-free, non-blocking queues. Scala actors use Fork/Join as the underlying implementation and exposes a concurrency model that encourages shared-nothing structures.