Hello @uogygiuol,
Thank you for the added details.
Yes, reducing the complexity of questions is the right thing to do, thank you for doing tit. Essays are counterproductive, as we then tend to overlook things (q.e.d., I overlooked the fact that you wanted to mangle the scene file in this thread).
In general, trying to mangle a file beforehand is not a good route, as you always risk invalidating the file. For your very specific scenario - very simple scene graph, just geometry, no materials, animations or other dependencies - it could make sense. I briefly talked with the owner of our GLTF-importer, and we do not do any sanity checking, e.g., comparing nodes with meshes. So, you could just 'clean up' the scene graph ("nodes") of the file, and Cinema's GLTF importer will then just ignore extra data in fields such as "meshes".
How fruitful this will be, you will have to find out yourself. I already had the hunch that your are here surfing on the edge of what is sensbible, and GLTF JSON files which translate to gigabytes of memory are certainly an edge case, due to the fact that text-based file formats are usually a bad choice for such heavy data.
Using Python to Read JSON
My guesstimate would be that when you throw a GLTF JSON file at Python's JSON parser - which takes five minutes to load in Cinema 4D - to mangle it, you end up with a net-loss or tie, because you loose most or more than the won time in that Python JSON stage.
Python's json module is mostly written in C to make it performant, but that is still a lot of JSON to deserialize, modify, and then serialize. One idea could be to use re, i.e., regular expressions, to find the "nodes" section in that file, just deserialize that from JSON, modify it, serialize back to a JSON string, and write it back in place, and by that sidestep having to deserialize that whole file. The problem with all that is that json.load allows you to pass a file object, allowing you to bypass the Python VM entirely and let the data reside in C until the parsing is done, while re does not allow you to regex a file object directly (AFAIK), you always must read the file object into lines or chunks to then pass these strings to the re module. I.e., you would have to load that whole file into a Python string first. What would come here out on top, I have no clue, but my hunch is that re might loose, as Python's string handling is not the fastest. Alternatives might be 3rd party libs such as isjon (a lazy JSON loader) but I do not know how performant it is.
For this section it would make a huge difference if you could predict the position of "nodes" in the file, either exactly as a chunk offset, or in the form of 'I know that it is always very close to the end, so let's regex parse the file in reverse'.
Using a Binary File Format
But the fact remains that text-format file types, e.g., JSON GLTF, become extemely ineffcient once you pass the ~100 MB barrier. Using something like binary GLTF or another binary format such as FBX will likely speed up your Cinema 4D loading times quite a bit, no extra steps required.
And to be clear, text-based file formats are always wildely ineffcient. It is just that below the ~100 MB barrier (adjust for the beefiness of your machine), you can drown that inefficency with pure computing power and have the nice advantage of a human-readble file format.
Cheers,
Ferdinand