��

�� :

�� .�., �� .�., �� .�. �� // �� . 2024. � 3. �. 1-11. DOI: 10.7256/2454-0714.2024.3.70849 EDN: MNOVWB URL: https://nbpublish.com/library_read_article.php?id=70849

��

�� 

ORCID: 0000-0001-8624-1662

��; �� ; �� - ��

119454, ��, �. ��, ��-� ��, 78

Alpatov Aleksey Nikolaevich

Associate Professor; IiPPO Department; MIREA - Russian Technological University

78 Vernadsky Ave., Moscow, 119454, Russia

aleksej01-91@mail.ru

��

�� 

��; �� ; ��

119454, ��, �. ��, ��-� ��, 78

Terloev Emil' Ziyaudinovich

Postgraduate student; Department of Instrumental and Applied Software; MIREA � Russian Technological University

78 Vernadsky Ave., Moscow, 119454, Russia

emil199@yandex.ru

�� 

�� ; �� ; ��

119454, ��, �. ��, ��-� ��, 78

Matchin Vasilii Timofeevich

Senior Lecturer; Institute of Information Technology; MIREA � Russian Technological University

78 Vernadsky Ave., Moscow, 119454, Russia

matchin@mirea.ru

DOI:

10.7256/2454-0714.2024.3.70849

EDN:

MNOVWB

�� :

26-05-2024

�� :

10-06-2024

��: � �� . � �� , �� . �� , �� , �� . � �� . � �� , �� , �� . � �� , � �� , �� . � �� , �� : �� - � ��, � �� , �� . �� , �� . �� , �� , �� , � �� . �� , �� , �� . �� , �� . �� , �� . �� , � �� . �� 70%, �� . �� .

�� :

�� , �� , �� , �� , ��, �� , �� , �� , �� , ��

Abstract: The article reflects the use of neural network technologies to determine the facts of falsification of the contents of video sequences. In the modern world, new technologies have become an integral part of the multimedia environment, but their proliferation has also created a new threat � the possibility of misuse to falsify the contents of video sequences. This leads to serious problems, such as the spread of fake news and misinformation of society. The scientific article examines this problem and determines the need to use neural networks to solve it. In comparison with other existing models and approaches, neural networks have high efficiency and accuracy in detecting video data falsification due to their ability to extract complex features and learn from large amounts of source data, which is especially important when reducing the resolution of the analyzed video sequence.�Within the framework of this work, a mathematical model for identifying the falsification of audio and video sequences in video recordings is presented, as well as a model based on a three-dimensional convolutional neural network to determine the fact of falsification of a video sequence by analyzing the contents of individual frames. Within the framework of this work, it was proposed to consider the problem of identifying falsifications in video recordings as a joint solution to two problems: identification of falsification of audio and video sequences, and the resulting problem itself was transformed into a classical classification problem. Any video recording can be assigned to one of the four groups described in the work. Only the videos belonging to the first group are considered authentic, and all the others are fabricated. To increase the flexibility of the model, probabilistic classifiers have been added, which allows to take into account the degree of confidence in the predictions. The peculiarity of the resulting solution is the ability to adjust the threshold values, which allows to adapt the model to different levels of rigor depending on the task. The architecture of a three-dimensional convolutional neural network, including a preprocessing layer and a neural network layer, is proposed to determine fabricated photoreceads. The resulting model has a sufficient degree of accuracy in determining falsified video sequences, taking into account a significant decrease in frame resolution. Testing of the model on a training dataset showed the proportion of correct detection of video sequence falsification above 70%, which is noticeably better than guessing. Despite the sufficient accuracy, the model can be refined to more significantly increase the proportion of correct predictions.

Keywords:

machine learning, neural networks, convolutional neural networks, video falsification, deepfakes, deepfake detection, audio falsification, data preprocessing, anomaly detection, batch normalization

��

� �� , �� . �� . �� DALL-E �� openAI, midjourney, stable diffusion, FaceApp, FaceSwap � �� ^[1] ^[2]. �� , �� elevenlabs, Microsoft custom neural voice � speechify ^[3].

� �� , �� , �� , �� . �� , � �� . �� , �� «��» �� , � �� ^[4].

� �� , �� , � �� ^[5]. �� , �� , � �� , �� . �� ^[6].

�� , �� 2022-�� , �� YouTube �� , � �� , �� . �� 1. ^[7]

�� 1 – �� YouTube^[7]

� �� , �� . � �� , �� , �� , �� .

�� 

�� , � �� , �� .

�� 4-� ��:

1. �� ;

2. �� , �� ;

3. �� , �� ;

4. �� .

��, �� , �� , � �� .

�� X �� , � �� - �� . �� . �� (�� ), � �� , �� (�� ). �� , �� , �� , �� . � �� , �� , �� , �� . �� .

�� . �� , ��, �� . �� . ��, �� , �� , � �� . �� , �� , �� (�� ). �� , �� , �� (�� ). ��, �� , �� , �� (�� ). �� «��» �� , �� , �� . �� , �� .

�� , �� , � . �� . �� . �� , �� , �� 0. ��, , �� , �� 0 �� .

�� . ��, � �� , �� , �� . �� , �� . � �� , �� , �� , � �� .

��, �� , �� . � �� .

�� . �� , �� . �� N x M x K,

��:

N — �� ,

M � K — �� (�� ).

�� 1 x M x K, �� , � �� N x 1 x 1, �� . �� , �� N x M x K, � �� ^[9].

�� , �� C — �� (��, 3 �� RGB-��). �� , � �� , �� , �� . ��, �� — ��

�� , �� (��.stride) � �� (��.padding). �� (�� stride = 1 � padding = 0)

�� , �� . �� — ��

�� 2.

�� 2 – ��

�� 

� �� ZF DeepFake Dataset ^[10]. �� , �� 199 �� 176 �� (�� ).

� �� . � �� 10 �� 224 �� 224 ��. �� 10 � 224 � 224 � 3, �� – �� : ��, �� .

�� . �� ; �� , 16-� �� 3 � 7 � 7; �� , �� -�� (ReLU)^[11]; �� 112 � 112; �� 32-� �� 3 � 3 � 3; �� 64 � 64; �� 64-� �� 3 � 3 � 3; �� ^[12], �� (flatten) � �� 10-� ��. �� – �� Adam � �� 0.0001.

�� 100 ��, �� 50 �� 50 ��. �� 40 ��, � �� 20 �� 20 ��. �� . �� 10 ��. �� 3.

�� 3 – �� -��

�� 

�� (��) �� 75%. �� 4. �� 5.

�� 4 – ��

�� 5 – ��

�� , �� . �� , �� 6.

�� 6 – ��

�� , �� , �� , ��-�� . �� 7.

�� 7 – ��

�� , ��, � F1-�� 1.

�� 1 – �� , ��, � F1-��

��/��	��	��
��	0.552	0.6364
��	0.8	0.35
F1-��	0.653	0.451

��

� �� . �� , �� ; �� , �� ; �� .

�� , � �� .

��

1. Beyan E.V. P., Rossy A.G.C. A review of AI image generator: influences, challenges, and future prospects for architectural field // Journal of Artificial Intelligence in Architecture. 2023. V. 2. �. 1. Pp. 53-65.
2. Huang Y. F., Lv S., Tseng K.K., Tseng P.J., Xie, X., Lin, R.F.Y. Recent advances in artificial intelligence for video production system // Enterprise Information Systems. 2023. V. 17. �. 11. Pp. 2246188.
3. Albert V. D., Schmidt H. J. Al-based B-to-B brand redesign: A case study // Transfer. 2023. P. 47.
4. �� . �. �� ̆ � �� //European Journal of Arts. 2023. No1. �. 33-37. DOI: https://doi.org/10.29013/EJA-23-1-33-37
5. Chow, P. S. Ghost in the (Hollywood) machine: Emergent applications of artificial intelligence in the film industry // NECSUS_European Journal of Media Studies. 2020. V. 9. �. 1. Pp. 193-214.
6. ��̆�� . �. �� ̆�� ̆�� // ��-��. 2023. No 2(105). �. 143-148.
7. Vakilinia I. Cryptocurrency giveaway scam with youtube live stream // 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). 2022. Pp. 0195-0200.
8. Tran D., Wang H., Torresani L., Ray J., LeCunY., Paluri M. A closer look at spatiotemporal convolutions for action recognition // Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. Pp. 6450-6459.
9. Naik K. J., Soni A. Video classification using 3D convolutional neural network // Advancements in Security and Privacy Initiatives for Multimedia Images. IGI Global. 2021. Pp. 1-18.
10. ZF DeepFake Dataset [�� ] URL: https://www.kaggle.com/datasets/zfturbo/zf-deepfake-dataset (�� : 20.01.2024).
11. Garbin C., Zhu X., Marques O. Dropout vs. batch normalization: an empirical study of their impact to deep learning // Multimedia tools and applications. 2020. V. 79. �. 19. Pp. 12777-12815.
12. Zhou D. X. Theory of deep convolutional neural networks: Downsampling // Neural Networks. 2020. V. 124. Pp. 319-327.

References

1. Beyan, E. V. P., & Rossy, A. G. C. (2023). A review of AI image generator: influences, challenges, and future prospects for architectural field. Journal of Artificial Intelligence in Architecture, 2(1), 53-65.
2. Huang, Y., Lv, S., Tseng, K. K., Tseng, P. J., Xie, X., & Lin, R. F. Y. (2023). Recent advances in artificial intelligence for video production system. Enterprise Information Systems, 17(11), 2246188.
3. Albert, V. D., & Schmidt, H. J. (2023). Al-based B-to-B brand redesign: A case study. transfer, 47.
4. Aliev, E. V. (2023). Problems of using digital technologies in the film industry. European Journal of Arts, 1, 33-37.
5. Chow, P. S. (2020). Ghost in the (Hollywood) machine: Emergent applications of artificial intelligence in the film industry. NECSUS_European Journal of Media Studies, 9(1), 193-214.
6. Lemaykina, S. V. (2023). Problems of counteracting the use of dipfeits for criminal purposes. Jurist-Pravoveden, 2(105), 143-148.
7. Vakilinia, I. (2022, October). Cryptocurrency giveaway scam with youtube live stream. In 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0195-0200). IEEE.
8. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459).
9. Naik, K. J., & Soni, A. (2021). Video classification using 3D convolutional neural network. In Advancements in Security and Privacy Initiatives for Multimedia Images (pp. 1-18). IGI Global.
10. ZF DeepFake Dataset [Electronic resource]. Retrieved from https://www.kaggle.com/datasets/zfturbo/zf-deepfake-dataset.
11. Garbin, C., Zhu, X., & Marques, O. (2020). Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimedia tools and applications, 79(19), 12777-12815.
12. Zhou, D. X. (2020). Theory of deep convolutional neural networks: Downsampling. Neural Networks, 124, 319-327.

��

� �� .
�� .

� �� (3D CNN) �� . �� , �� .
�� 3D CNN, �� , �� . �� ZF DeepFake �� , �� . �� , �� .
� �� , �� . �� , �� .
�� 3D CNN �� , � �� . �� , �� .
�� . �� . �� , �� . �� , �� .
� �� . �� . �� , �� .
�� , �� . �� , �� , �� .
�� . �� . �� , �� . �� , �� , �� .
�� . �� .
�� : � �� , ��, � F1-�� .

����������� ��������� ��������� ��������� ���� ��� �������������� ����� ������������� ���������

��