FAH-Addict Forum
• Random Chat » NVIDIA GPU - CUDA Multiprocessor Programming
A little off topic, I know ....
CUDA seems to be the multiprocessor library of choice for the F@H project.
Has anyone had any experience in multiprocessor arrays or using the CUDA library or using the rationalised version of 'C' used to program a CUDA array?

Mark
CUDA seems to be the multiprocessor library of choice for the F@H project.
Has anyone had any experience in multiprocessor arrays or using the CUDA library or using the rationalised version of 'C' used to program a CUDA array?

Mark
Administrator

![]()
Group: Administrator
Sex:
Signed up since: 08/09/09
Messages: 27
Place: Nantes, France
Yes, I used CUDA at the end of my studies. Do you have any question about the API ?
The question was meant generally ... just to get a conversation going.
However specifically, I AM working on a distributed multiprocessor OS, designed with automatic object distribution and migration in mind. A sort of ... VERY VERY light weight CORBA for embedded processors.
I was wondering how I might port this to a CUDA processor array, with objects generated in the CUDA 'C' subset.
However specifically, I AM working on a distributed multiprocessor OS, designed with automatic object distribution and migration in mind. A sort of ... VERY VERY light weight CORBA for embedded processors.
I was wondering how I might port this to a CUDA processor array, with objects generated in the CUDA 'C' subset.
Administrator

![]()
Group: Administrator
Sex:
Signed up since: 08/09/09
Messages: 27
Place: Nantes, France
The normal way to use CUDA the more powerfully is not to use a network based cluster.
Why ?
Cuda is very powerfull to work on massively parallele task. The data transfert time from CPU RAM to GPU VRAM has to be very low compared to processing time. PCI Express 16X V2 is ... very slow for CUDA ! So imagine the awful time needed to transfert a data block on a 1 GB ethernet network (or 10 GB, it's still not enough).
How to optimise a CUDA software ? Use the maximum number of GPU at the same time ! (As an exemple I saw a demo with 4 9800 GX2). Cuda allow tu use more than one card for the same task. Cut the task, copy on each GPU, and process. It's the most optimised use of CUDA.
Using CUDA on a networked cluster is possible. But the processing time has to me very long compared to the transfert time on a ethernet network. It's simplier and cost effective to have 1 big CUDA monster with 4 graphic cards than 4 PC with 1 card.
Why ?
Cuda is very powerfull to work on massively parallele task. The data transfert time from CPU RAM to GPU VRAM has to be very low compared to processing time. PCI Express 16X V2 is ... very slow for CUDA ! So imagine the awful time needed to transfert a data block on a 1 GB ethernet network (or 10 GB, it's still not enough).
How to optimise a CUDA software ? Use the maximum number of GPU at the same time ! (As an exemple I saw a demo with 4 9800 GX2). Cuda allow tu use more than one card for the same task. Cut the task, copy on each GPU, and process. It's the most optimised use of CUDA.
Using CUDA on a networked cluster is possible. But the processing time has to me very long compared to the transfert time on a ethernet network. It's simplier and cost effective to have 1 big CUDA monster with 4 graphic cards than 4 PC with 1 card.
I'm not looking at "Network" per se ... not network like - ethernet type of networking.
Where large objects lollop and bound with kilobytes or megabytes of data, servicing hundreds of functions and tasks.
I need to execute a "core procedure" on each tiny "computing engine" in a massive set of minute cores. So a none SIMD architecture is required, MIMD would be ideal. The "cleverness" is required in the "messaging and routing" system. The ability to identify and route requsts and replies to specific objects would be essential.
I intend to give every object within a system one or more processor cores - that perform simple tasks. eg a ring buffer object, that takes one of 5 messages ... configure size, flush, check, add, take.
Even if it means that an object is a "functor" and all it does is execute one function when "requested" and "replies" with the answer.
From what you've said, it sounds like CUDA is more of the "SIMD" mutli-processor architecture - where the same operation is executed on multiple sets of data - like multiply accumulates or matrix manipulation. Mathematical repetition - not multiple logic streams.
Where large objects lollop and bound with kilobytes or megabytes of data, servicing hundreds of functions and tasks.I need to execute a "core procedure" on each tiny "computing engine" in a massive set of minute cores. So a none SIMD architecture is required, MIMD would be ideal. The "cleverness" is required in the "messaging and routing" system. The ability to identify and route requsts and replies to specific objects would be essential.
I intend to give every object within a system one or more processor cores - that perform simple tasks. eg a ring buffer object, that takes one of 5 messages ... configure size, flush, check, add, take.
Even if it means that an object is a "functor" and all it does is execute one function when "requested" and "replies" with the answer.
From what you've said, it sounds like CUDA is more of the "SIMD" mutli-processor architecture - where the same operation is executed on multiple sets of data - like multiply accumulates or matrix manipulation. Mathematical repetition - not multiple logic streams.
Administrator

![]()
Group: Administrator
Sex:
Signed up since: 08/09/09
Messages: 27
Place: Nantes, France
MIMD of the focus of CUDA 3 and Fermi
Oooooh!
I went through a few of the NVIDIA CUDA tutorials the other night, and don't remember seeing anything on CUDA 3 there ....
Do you know if there is any documentation ready for that yet?

Mark
Edit by MarkAGR On 11/12/09 at 02h15
Administrator

![]()
Group: Administrator
Sex:
Signed up since: 08/09/09
Messages: 84
Place: Bordeaux, Franc...
I think CUDA 3 (which will come with OpenCL) stuff is currently available to registered developers : http://developer.nvidia.com/object/get-opencl.html
Administrator

![]()
Group: Administrator
Sex:
Signed up since: 08/09/09
Messages: 27
Place: Nantes, France
MarkAGR:
Oooooh!
I went through a few of the NVIDIA CUDA tutorials the other night, and don't remember seeing anything on CUDA 3 there ....
Do you know if there is any documentation ready for that yet?

Mark
Look at nVidia 195 beta release note too.
Oooooh!
I went through a few of the NVIDIA CUDA tutorials the other night, and don't remember seeing anything on CUDA 3 there ....
Do you know if there is any documentation ready for that yet?

Mark
Look at nVidia 195 beta release note too.
• Random Chat » NVIDIA GPU - CUDA Multiprocessor Programming
1 User online :: 0 Administrator, 0 Moderator, 0 Member and 1 Visitor
User online: No member online
User online: No member online
Answer
You aren't allowed to write in this category




















