Lol. The file sizes we are talking about would be tiny. Honestly this pixel shader load could be run by cards that came out 14 years ago. And if you really cant run it toggling it would be a simple solution. This wouldn't be some crazy complex programming. This is a technique that was simple when it came out a LONG time ago.
The only real cost would be making the normalmaps in the first place as they would need to be generated and touched up for every object.
But as for the matter you and everyone else is talking about. Performance... I'm afraid that anyone who thinks this would be a performance related issue simply doesn't actually understand how this technique works in the code and hardware.