This is a confusing and very general question because I don't know the proper terminology yet. I need help seeding my search and suggestions on what to do or what to avoid etc.
I trained a CNN on Google Colab pro using TensorFlow/Keras on MobileNetV2, it works, yay! Someone (who I don't have access to anymore) mentioned that I can load models directly onto GPU memory and set it up for real-time predictions. I am a Ph.D. student and if I could show proof of concept of this for my model, with my setup it could be a great help to our lab!!!
The issue is I don't know where to start!!! I want to test the process of loading our finished model onto the GPU for real-time predictions. We stream 1000 frames per second video 61x61 pixels per frame, grayscale for about 4 seconds at a time. I want to be able to set it up on google colab since we don't have a good GPU yet. I am only looking for a proof of concept as I know I cant get real-time predictions if I uploaded all that data (I can fake the video feed in python for now).
So how do I load a model directly to GPU memory?
Any suggestions on how to go about what modules to use, things to avoid?
any thoughts on realistic latencies for such a setup?
I understand latency varies based on a lot of things but if anyone has experience with anything remotely similar what are your latencies? I would like latencies of <= 8 ms
Anything is likely helpful and I am more than happy to do a bunch of reading I just need your help getting there! thank you all so much!