More and more, AI is trying to make machines teach themselves with a minimum of human guidance. So-called self-supervision is an element that can be added to lots of machine learning tasks so that a computer learns with less human help, perhaps someday with none at all. 

Scientists at China's Sun Yat-Sen University and Hong Kong Polytechnic University use self-supervision in a new bit of research to help a computer learn the pose of a human figure in a video clip. 

Understanding what a person is doing in a picture is its own rich vein of machine learning research, useful for a whole number of things including video surveillance. But such methods rely on "annotated" data sets where labels are carefully applied to the orientation of the joints of the body.  

Also: Watching YouTube videos may someday let robots copy humans[1]

That's a problem because larger and larger "deep" neural networks are hungry for more and more data, but there isn't always enough labeled data to feed the network.

So, the Sun Yat-Sen researchers set out to show a neural network can refine its understanding by continually comparing the guesses of multiple networks with one another, ultimately lessening the need for the "ground truth" afforded by a labeled data set. 

china-ai-scientists-refine-pose-predictions-2019.png
China's AI scientists show how their machine learning model refined its "prediction" of the 3D pose of an actor from an image by adding some self-supervision code to the last part of the neural network. (Image: Wang et. al. 2019)

As the authors put it, the prior efforts for inferring a human pose have achieved success, but at the expense of a "time-consuming network architecture (e.g., ResNet-50) and limited scalability for all scenarios

Read more from our friends at ZDNet