Vinay Prabhu
1 min readNov 7, 2017

--

Sharp Minima Can Generalize For Deep Nets https://arxiv.org/pdf/1703.04933.pdf
Also, there was another paper published in June that states: ‘ In Dinh et al. [2017], the authors argued that the property of Hessian cannot be directly applied to explain generalization. The reason to this argument is that although Ek∇xfk 2 2 is invariant to node-scaling, the Hessian ∇2 θRemp(θ) not. However, in most cases of neural networks, the learned solutions are close to zero (i.e. with small norms) due to the near-zero random initialization, and thus the term kBk 4 F + 2CσD p Remp in the bound (11) is dominated by the Hessian k∇2 cRempk 2 F . Therefore, it is reasonable to apply the property of the Hessian to explain the generalization ability.’
https://arxiv.org/pdf/1706.10239.pdf

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Vinay Prabhu
Vinay Prabhu

Written by Vinay Prabhu

PhD, Carnegie Mellon University. Chief Scientist at UnifyID Inc

Responses (1)

Write a response