Sharp Minima Can Generalize For Deep Nets https://arxiv.org/pdf/1703.04933.pdf
Also, there was another paper published in June that states: ‘ In Dinh et al. [2017], the authors argued that the property of Hessian cannot be directly applied to explain generalization. The reason to this argument is that although Ek∇xfk 2 2 is invariant to node-scaling, the Hessian ∇2 θRemp(θ) not. However, in most cases of neural networks, the learned solutions are close to zero (i.e. with small norms) due to the near-zero random initialization, and thus the term kBk 4 F + 2CσD p Remp in the bound (11) is dominated by the Hessian k∇2 cRempk 2 F . Therefore, it is reasonable to apply the property of the Hessian to explain the generalization ability.’
https://arxiv.org/pdf/1706.10239.pdf