I'll start with the things that I can answer now, while I continue to research your other questions.
We have our own AR/Mixed reality engine that we just released, called the Bridge Engine, that you should check out: https://bridge.occipital.com/developers/. It performs almost exactly as you envision your app.
SLAM engine: You won't necessarily need to build a new SLAM system entirely, but if you aren't planning on using the Bridge Engine, you'll need to develop your own way of creating a bunch of different things, such as stereo video for scanning purposes. As I haven't done this myself, I do not know how difficult it will be.
Mesh Size: OpenGL ES only supports 16 bits unsigned short for face indices, meshes larger than 65535 faces have to be split into smaller sub-meshes. The STMesh class and STMesh Objects are, therefore, a reference to a collection of partial meshes, each of them having less than 65k faces.
Scanner App: The scanner app will be a great reference on how to utilize our SLAM engine to scan. I think that the Room Capture Sample app will also help you understand the principles needed to create the app you want.
Let us know how things progress!