As the prevailing of video live streaming, establishing
online pixelation, or at least online face pixelation mechanism, is
an urgency. In this paper, we develop a new method called Face
Pixelation in Video Live Streaming (FPVLS) to generate automatic personal privacy filtering during unconstrained streaming
activities. Simply applying multi-face trackers will encounter
the problem in computing efficiency, target drifting, and overpixelation due to the inherent feature of live video streaming.
Therefore, for fast and accurate pixelation of irrelevant people’s
faces, FPVLS is organized in a frame-to-video structure with
two core stages. On individual frames, our framework utilizes
the fast and cost-effective merits of image-based face detection
and embedding networks to yield face vectors. We propose a
Positioned Incremental Affinity Propagation (PIAP) clustering
algorithm to associate the same person’s faces across frames
according to face vectors and corresponding positions. PIAP
also extends the classic affinity propagation into an incremental
way for the efficient generation of the raw face trajectories.
Such frame-wise accumulated raw trajectories are likely to be
intermittent and unreliable on video level. Hence, we further
introduce a trajectory refinement stage that merges a proposal
network for loosed face detection with the two-sample test based
on the Empirical Likelihood Ratio (ELR) statistic to compensate
the deep network insufficiency and refine the raw trajectories
seamlessly. The shallow proposal network and ELR will not
trigger the computation burden. A Gaussian filter is laid on
the refined trajectories for final pixelation.
Graph of Conceptual Working Pipeline
Toy example of PIAP Clustering
A toy example of how PIAP Clustering works is presented. (a)-(i) are corresponding to left-to-right and then top-to-bottom order of pictures in the toy example.
Traditional AP clustering is implemented on the first batch of objects, it converges in (a), and the clustering result is shown in (b). New objects arrive in (c), and aggregated affinity is recomputed in (d). Message-passing continues in (e) and (h), and reconverges in (i). The final clustering result is shown in (i).
The Structure of Proposal Net(Compensate Detection through Shallow CNN)
To fix gaps accumulated by false negatives in a trajectory, we build a proposal net structured as below. The proposal net resizes frames as MTCNN does, and proposes suspicious face areas in such gap frames.
Two-sample test based on ELR (compensate the detection lost)
Relationship of z (solid line) and z' (orange dash lines). Orange dots are the suspicious faces proposed by the proposal network. Red areas on z are the breaks recovred by
interpolation.
Video Test Data Results(Naive Cases)
Video Test Data Results(sophisticated Cases)
Youtude Studio offline tools failed to produce any mosaics in first few tens of seconds. Then, after some recalibration, Youtube Studio works under heavy drifting problems.
Our FPVLS results with PIPA clustering algorithm, and the compensation algorithm is not applied yet in following demo.
Our FPVLS results with the compensation algorithm.
Pixelation Results Analysis#1
We demonstrate the pixelation results of FPVLS vs. Youtube Studio offered offline face blur tool on 1080p high resolution (H) multi-people (S) scenario in this section. A thumbnail is used to show the results of pixelation in sequential order from left to right. Since the paper-sized thumbnail cannot present the details, we also show the origin pictures under the thumbnail one after one. The upper row of the thumbnail is produced offline by Youtube Studio; FPVLS generates the lower row in real-time. This test presents the FPVLS's ability of raw trajectories refinement. The live-streaming happened in a crowded street with a noisy and complex backgournd. Except the main streamer James Xiao, all other people including the dancers are set to be blurred. With unpredictable camera movements, tracking algorithms cannot handle the drifting and tracking loss problem due to the failed linkage of tracklets. However, FPVLS can still place mosaics on irrelevant people's faces precisely through compensating detections and empirical likelihood ratio test.
The Thumbnail of Pixelation Results#1
Original Pictures of Pixelation Results#1
Youtube Studio Pixelation Results
FPVLS Pixelation Results
Pixelation Results Analysis#2
Another pixelation result of FPVLS vs. Youtube Studio on low resolution (480p) (L) few-people (N) scenario is shown in this section. A thumbnail is used to display the results of pixelation in live-streaming from left to right. Since the paper-sized thumbnail cannot present the details, we also show the origin pictures under the thumbnail one after one. The upper row of the thumbnail is produced offline by Youtube Studio. FPVLS generates the lower row in real-time. This test focus on the typical over-pixelation problem that cannot be handled by the current face tracking algorithms. James Xiao is playing the piano while his friend is watching. His friend's face is set to be blurred for privacy protection.When two or more faces are overlapped with each other, we don't want to pixelate the occluded faces anymore since they are invisible to the audience. However, for tracking algorithms, they insist on predicting the movement of such partial/fully occluded faces and retain their tracklet. These algorithms will produce many annoying and odd mosaics during streaming.
The Thumbnail of Pixelation Results#2
Original Pictures of Pixelation Results#2
Youtube Studio Pixelation Results
FPVLS Pixelation Results
Brief Conclusion
According to our knowledge, we are the first to address the face pixelation problem in live video streaming by building the proposed FPVLS. FPVLS is already surpassing offline tool offered by YouTube and Microsoft and becomes applicable in real life scenarios. FPVLS can achieve high accuracy and real-time performances on the dataset we collected. We will extend FPVLS to behave on other privacy sensitive objects in the future.