S.No | Project Code | Project Title | Abstract |
---|---|---|---|
Artifical Intelligence |
|||
1 | VTPAI01 | Eye-LRCN: A Long-Term Recurrent Convolutional Network for Eye Blink Completeness Detection | |
2 | VTPAI02 | Explainable AI Based Neck Direction Prediction and Analysis During Head Impacts | |
3 | VTPAI03 | A Novel Digital Audio Encryption and Water Marking Scheme | |
4 | VTPAI04 | Advanced Encryption for Quantum-Safe Video Transmission | |
5 | VTPAI05 | Secret Image Sharing using Shamir Secret Rule | |
6 | VTPAI06 | AI Model for Identification of Micro-Nutrient Deficiency in Banana Crop | |
7 | VTPAI07 | A Novel Medical Image Encryption Scheme Based on Deep Learning Feature Encoding and Decoding | |
8 | VTPAI08 | Efficient Anomaly Detection Algorithm for Heart Sound Signal | |
9 | VTPAI09 | Stroke Prediction Using XGBoost and a fusion of XGBoost with Random Forest | |
10 | VTPAI10 | Road Object Detection in Foggy Complex Scenes Based on Improved YOLOv10 | |
11 | VTPAI11 | Jellyfish Detection using Improved YOLO Algorithm | |
12 | VTPAI12 | Novel Animal Detection System using YOLO With Adaptive Preprocessing and Feature Extraction | |
13 | VTPAI13 | Image Translation and Reconstruction with advanced Neural Networks | |
14 | VTPAI14 | Improved YOLO Algorithm to detect Marine Debris in Surveillance | |
15 | VTPAI15 | Railway Objects Detection by Using Improved YOLO Algorithm | |
16 | VTPAI16 | Automatic Detection of Foreign Object Debris on Airport Runway by Using YOLO | |
17 | VTPAI17 | Personalized Book Intelligent Recommendation System | |
18 | VTPAI18 | Safety Helmet Detection Based on Improved YOLO | |
19 | VTPAI19 | Efficient Pomegranate Growth Stage Detection Using YOLOv10: A Novel Object Detection Approach | |
20 | VTPAI20 | EC-YOLO: Advanced Steel Strip Surface Defect Detection Model Based on YOLOv10 | |
21 | VTPAI21 | Palm Oil Counter: State-of-the-Art Deep Learning Models for Detection and Counting in Plantations | |
22 | VTPAI22 | Detection of Hand Bone Fractures in X-Ray Images Using Hybrid YOLO NAS | |
23 | VTPAI23 | SUNet: Coffee Leaf Disease Detection Using Yolo |
Natural Language Processing |
|||
---|---|---|---|
1 | VTPNLP01 | Gun Sound Recognition Using NLP and YAMNET model | |
2 | VTPNLP02 | Classification and Recognition of Lung Sounds Based on Improved Bi-ResNet Model | |
3 | VTPNLP03 | Deep Learning Algorithms for Cyber-Bulling Detection in Social Media Platforms | |
4 | VTPNLP04 | Novel Meta Learning Approach for Detecting Postpartum Depression Disorder Using Questionnaire Data | |
5 | VTPNLP05 | Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches | |
6 | VTPNLP06 | Fake News Detection Using Deep Learning | |
7 | VTPNLP07 | Live Event Detection for People's Safety Using NLP and Deep Learning | |
8 | VTPNLP08 | Socially Aware Synthetic Data Generation for Suicidal Ideation Detection Using Natural Language Processing | |
9 | VTPNLP09 | Climate Change Sentiment Analysis Using Natural Language Processing and LSTM Model | |
10 | VTPNLP10 | Explainable Detection of Depression in Social Media Contents Using Natural Language Processing | |
11 | VTPNLP11 | Natural Language Processing and CNN Model for Indonesian Sarcasm Detection | |
12 | VTPNLP12 | A Novel Customer Review Analysis System Based on Balanced Deep Review and Rating Differences in User Preference | |
13 | VTPNLP13 | How Do Crowd-Users Express Their Opinions Against Software Applications in Social Media? A Fine-Grained Classification Approach | |
14 | VTPNLP14 | Advanced NLP Models for Technical University Information Chatbots: Development and Comparative Analysis |
Image Processing |
|||
---|---|---|---|
1 | VTPIP01 | On Enhancing Crack Semantic Segmentation Using StyleGAN DeepLabV3 & ResNet50 | |
2 | VTPIP02 | Underwater Image Enhancement Based on Conditional Denoising Diffusion Probabilistic Model | |
3 | VTPIP03 | A Universal Field-of-View Mask Segmentation Method on Retinal Images From Fundus Cameras | |
4 | VTPIP04 | Deep learning algorithms for Optimized Thyroid Nodule Classification | |
5 | VTPIP05 | Multi-Class Medical Image Classification Using Deep learning with Xception Model | |
6 | VTPIP06 | Advancing Malaria Identification From Microscopic Blood Smears Using Hybrid Deep Learning Frameworks | |
7 | VTPIP07 | Segmentation of Aerial Images using U-Net model | |
8 | VTPIP08 | Development of Convolutional Neural Network to Segment Ultrasound Images of Histotripsy Ablation | |
9 | VTPIP09 | Automated Detection of Spinal Lesions from CT Scans via Deep Transfer Learning | |
10 | VTPIP10 | Image Processing Techniques for Emotion Recognition | |
11 | VTPIP11 | A Lightweight and Multi-Branch Module in Facial Semantic Segmentation Feature Extraction | |
12 | VTPIP12 | A Plant Leaf Disease Image Detection and Classification with Convolutional Neural Networks (CNN) and Opencv | |
13 | VTPIP13 | Dual-Branch Fully Convolutional Segment Anything Model for Lesion Segmentation in Endoscopic Images | |
14 | VTPIP14 | Segnet Algorithm Guided Image Channel Selection for Skin Lesion Segmentation | |
15 | VTPIP15 | Image Encryption and Decryption Using AES in CBC Mode with Flask | |
16 | VTPIP16 | Efficient Single Infrared Image Super-Resolution |
Machine Learning |
|||
---|---|---|---|
1 | VTPML01 | Applying Machine Learning Algorithms for the Classification of Sleep Disorders | |
2 | VTPML02 | Machine Learning-Based Cardiovascular Disease Detection Using Optimal Feature Selection | |
3 | VTPML03 | A Novel Web Framework for Cervical Cancer Detection System | |
4 | VTPML04 | Enhancing Medicare Fraud Detection Through Machine Learning : Addressing Class Imbalance With SMOTE-ENN | |
5 | VTPML05 | Hybrid Machine Learning Model for Efficient Botnet Attack Detection in IoT Environment | |
6 | VTPML06 | An Improved Concatenation of AI Models for Predicting and Interpreting Ischemic Stroke | |
7 | VTPML07 | Investigating Evasive Techniques in SMS Spam Filtering | |
8 | VTPML08 | Enhancing the Prediction of Employee Turnover With Knowledge Graphs and Explainable AI | |
9 | VTPML09 | Cardio-tocography Data Analysis for Fetal Health Classification Using Machine Learning Models | |
10 | VTPML10 | Head injury detection using machine learning | |
11 | VTPML11 | Predicting Heart Diseases Using Machine Learning and Different Data Classification Techniques | |
12 | VTPML12 | Liver Cirrhosis Stage Classification using Machine Learning | |
13 | VTPML13 | Identification of Social Anxiety in High School: A Machine Learning Approaches to Real-Time Analysis of Student Characteristics | |
14 | VTPML14 | Predicting Hospital Stay Length Using Explainable Machine Learning | |
15 | VTPML15 | Optimal Ensemble Learning Model for Dyslexia Prediction Based on an Adaptive Genetic Algorithm | |
16 | VTPML16 | Toward Improving Breast Cancer Classification Using an Adaptive Voting Ensemble Learning Algorithm | |
17 | VTPML17 | Machine Learning based Method for Insurance Fraud Detection on Class Imbalance Datasets with Missing Values | |
18 | VTPML18 | An Approach for Crop Prediction in Agriculture: Integrating Genetic Algorithms and Machine Learning | |
19 | VTPML19 | Novel Machine Learning Techniques for Classification of Rolling Bearings | |
20 | VTPML20 | Enhancing Rice Production Prediction in Indonesia Using Advanced Machine Learning Models |
Deep Learning |
|||
---|---|---|---|
1 | VTPDL01 | Exploring Deep Learning and Machine Learning Approaches for Brain Hemorrhage Detection | |
2 | VTPDL02 | Multi-Class Kidney Abnormalities Detecting Novel System Through Computed Tomography | |
3 | VTPDL03 | Medicinal Plant Classification Using Particle Swarm Optimized Cascaded Network | |
4 | VTPDL04 | Effective Hypertension Detection Using Predictive Feature Engineering and Deep Learning | |
5 | VTPDL05 | Innovations in Stroke Identification: A Machine Learning-Based Diagnostic Model Using Neuro images | |
6 | VTPDL06 | RoI-Attention Network for Small Disease Segmentation in Crop Leaf Images | |
7 | VTPDL07 | Classification of Down Syndrome in Children Using Neural Networks | |
8 | VTPDL08 | A Large Dataset to Enhance Skin Cancer Classification with Transformer-Based Deep Neural Networks | |
9 | VTPDL09 | A Reliable and Robust Deep Learning Model for Effective Recyclable Waste Classification | |
10 | VTPDL10 | CiFake: Image Classification and Explainable Identification of Ai-Generated Synthetic Images | |
11 | VTPDL11 | Paddy Leaf Disease Classification Using Efficient Net B4 With Compound Scaling and Swish Activation: A Deep Learning Approach | |
12 | VTPDL12 | Explainable Deep Learning to Classify Royal Navy Ships | |
13 | VTPDL13 | Tomato Quality Classification Based on Xception Algorithm Classifiers | |
14 | VTPDL14 | Automatic Classification of White Blood Cells Using Deep Learning Models | |
15 | VTPDL15 | OTONet: Deep Neural Network for Precise Otoscopy Image Classification | |
16 | VTPDL16 | JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection | |
17 | VTPDL17 | Federated Deep Learning for Monkeypox Disease Detection | |
18 | VTPDL18 | Multi-Fruit Classification and Grading | |
19 | VTPDL19 | Classification of Oral Cancer into Pre-Cancerous Stages from White Light Images | |
20 | VTPDL20 | YogaPoseNet: Advanced Yogic Posture Classification Using NASNet Architecture |
Eye blink detection using OpenCV, Python, and dlib is an advanced technique that plays a crucial role in applications like driver drowsiness detection, human-computer interaction, and fatigue monitoring. The process begins with face detection, where OpenCV's Haar cascades or deep learning-based detectors isolate the face region from an image or video stream. Subsequently, facial landmark detection is performed using dlib's pre-trained shape predictor, which identifies 68 key points on the face, including those around the eyes. From these landmarks, specific points outlining the eye regions are extracted, typically involving six points per eye. The Eye Aspect Ratio (EAR) is then calculated by measuring the vertical eye-opening relative to its horizontal width. The EAR significantly decreases when the eyes close during a blink. Continuous monitoring of the EAR allows for blink detection when the ratio falls below a set threshold for a brief period before returning to normal. This threshold and duration are adjustable based on the application and user requirements. The system is designed for real-time operation, ensuring immediate feedback on blink events through efficient processing of video frames and quick EAR calculations. By combining OpenCV for image processing, Python for programming, and dlib for facial landmark detection, this approach provides a robust and efficient solution for real-time eye blink detection, leveraging computer vision and machine learning to enhance safety and user interaction in various fields.
The research focuses on detecting and analyzing the neck rotation during head impacts using the YOLOv8 model, with the aim of providing preventive healthcare measures. The neck's position and orientation need to be monitored and measured to predict the direction of the neck during such impacts. The experiment involves simulating mild head impacts, replicating movements such as flexion and lateral rotation, based on American football scenarios, with data collected from ten subjects (five male and five female). The YOLOv8 model is employed to detect and track the neck's rotation in real-time from video footage, providing high accuracy in determining the direction of the neck during head impacts. By utilizing YOLOv8, this research aims to achieve a more efficient and precise system for detecting neck rotation compared to traditional methods. This approach, integrated with explainable AI, helps in offering meaningful interpretations of the results, which can support clinical systems in making informed decisions regarding head and neck injury prevention.
To enhance the privacy and security of audio signals stored in third-party storage centers, a robust digital audio encryption and forensics watermarking scheme is proposed. The scheme incorporates the AES-GCM (Advanced Encryption Standard in Galois/Counter Mode) algorithm for authenticated encryption, ensuring both confidentiality and integrity of the audio data. In addition, we utilize Fernet symmetric encryption and PBKDF2HMAC (Password-Based Key Derivation Function 2 with HMAC) for key generation, supported by generative hash passwords to further strengthen security. The signal energy ratio feature of audio signals is defined and used in the watermark embedding method through feature quantification, improving the resilience of the watermarking system. First, the original audio is encrypted using scrambling, multiplication, and AES-GCM to generate the encrypted data. The encrypted data is then divided into frames, each compressed through sampling. The compressed data, along with frame numbers, is embedded into the encrypted audio, forming the watermarked signal which is uploaded to third-party storage. Authorized users retrieve the encrypted data and verify its authenticity. If intact, the data is decrypted directly using Fernet to recover the original audio. In the case of an attack, the compromised frames are identified, and the embedded compressed data is used to reconstruct the audio approximately. The reconstructed signal is subsequently decrypted to retain the expression meaning of the original audio. Experimental results demonstrate the effectiveness of the proposed scheme in providing quantum-safe encryption, secure watermarking, and forensics capabilities.
This project enables secure video processing, encryption, and watermark embedding, focusing on user authentication, video encryption, and decryption capabilities. Users can register, log in, and upload videos along with watermarks for processing. Using the cryptography library, each uploaded video is encrypted, and its encryption key is split using Shamir's Secret Sharing, ensuring secure key distribution and storage. The encrypted frames are stored separately for later retrieval and decryption. Decryption occurs through reassembling key shares, allowing the original video to be reconstructed, with the watermark extracted from the first frame. The application further provides options to download the decrypted video, view split frames, and explore contact and performance information pages. Employing OpenCV for video processing and secure file handling techniques, this system ensures data confidentiality and integrity through a user-friendly interface and robust back-end encryption mechanisms. The application uses secure upload and storage mechanisms for sensitive data, like key shares and encrypted frames, storing them in predefined folders. Key shares are stored separately, further protecting the decryption process from unauthorized access.
The safeguarding of digitized data against unwanted access and modification has become an issue of utmost importance as a direct result of the rapid development of network technology and internet applications. In response to this challenge, numerous secret image sharing (SIS) schemes have been developed. SIS is a method for protecting sensitive digital images from unauthorized access and alteration. The secret image is fragmented into a large number of arbitrary shares, each of which is designed to prevent the disclosure of any information to the trespassers. In this paper, we present a comprehensive survey of SIS schemes along with their pros and cons. We review various existing verifiable secret image sharing (VSIS) schemes that are immune to different types of cheating. We have identified various aspects of developing secure and efficient SIS schemes. In addition to that, a comparison and contrast of several SIS methodologies based on various properties is included in this survey work. We also highlight some of the applications based on SIS. Finally, we present open challenges and future directions in the field of SIS.
This research presents an advanced convolutional neural network (CNN) model for diagnosing micro-nutrient deficiencies in banana crops through the analysis of leaf images. Proper nutrition is essential for optimal crop growth and yield, and deficiencies in vital nutrients can severely impact plant health and productivity. To address this, we have developed a specialized CNN model designed to detect and classify various nutrient deficiencies based on detailed leaf images. The study involves a comprehensive dataset of banana leaves exhibiting different deficiency symptoms, which was used to train and evaluate the model. The CNN architecture was carefully optimized to enhance feature extraction and classification capabilities, enabling precise identification of nutrient-related issues. The findings demonstrate the model’s effectiveness in distinguishing between different types of nutrient deficiencies, providing a valuable tool for precision agriculture. This approach aims to improve nutrient management practices and contribute to better crop health monitoring, highlighting the significant role of machine learning technologies in advancing agricultural research and practices.
Medical image encryption is critical to safeguarding patient privacy and maintaining the confidentiality of sensitive medical records. Leveraging advancements in artificial intelligence, we propose an innovative medical image encryption and decryption system that integrates deep learning-based encryption with QR code technology. This system enables users to upload a medical image, which is encrypted into a QR code format and paired with a uniquely generated key. Both the QR code and key are securely stored for subsequent retrieval decryption, users can upload the QR code and the corresponding key to reconstruct the original image with high fidelity. The encryption process employs advanced neural network-based feature encoding, ensuring robustness against attacks such as noise, cropping, and brute force. Additionally, the system incorporates a reversible neural network to optimize decryption accuracy and reconstruction quality. Experimental results highlight the system's efficiency in preserving image integrity, resisting various attacks, and maintaining end-to-end security in medical image encryption. This approach not only strengthens the privacy and security of medical data but also provides a user-friendly framework for securely transmitting and storing sensitive medical images.
Cardiovascular disease (CVD) continues to be a leading cause of death globally, claiming approximately 17.9 million lives each year, as reported by the World Health Organization (WHO). This high mortality rate underscores the need for effective early detection and intervention strategies. Heart sound signals, also known as phonocardiograms (PCGs), hold essential information about cardiac health, providing a non-invasive method to assess heart function. Recent advancements in deep learning have enabled the development of models capable of analyzing heart sounds to detect abnormal features, assisting in early diagnosis and disease prevention. However, the challenges in heart sound data, including imbalanced class distributions, complex feature characteristics, and limited differentiation between sounds like systolic and diastolic murmurs, have restricted the effectiveness of traditional deep learning models. This project presents a novel heart sound anomaly detection algorithm based on the Deep Neural Network Model. The DNN ability to capture both local and global features within a signal makes it particularly well-suited for analyzing heart sound data. The proposed algorithm was tested on the PhysioNet/CinC 2016 public dataset, a widely used dataset for heart sound classification. Experimental results demonstrated a high classification accuracy of 99%, with a specificity of 98.5% and a sensitivity of 98.9%. These metrics signify a substantial improvement over existing methods, highlighting the model’s effectiveness in detecting anomalies in heart sounds. The high sensitivity and specificity rates underscore the model's potential to serve as a reliable tool for early screening and diagnosis of cardiovascular diseases.
Stroke is a life-threatening medical condition caused by disrupted blood flow to the brain, representing a major global health concern with significant health and economic consequences. Researchers are working to tackle this challenge by developing automated stroke prediction algorithms, which can enable timely interventions and potentially save lives. As the global population ages, the risk of stroke increases, making the need for accurate and reliable prediction systems more critical. In this study, we evaluate the performance of an advanced machine learning (ML) approach, focusing on XGBoost and a hybrid model combining XGBoost with Random Forest, by comparing it against six established classifiers. We assess the models based on their generalization ability and prediction accuracy. The results show that more complex models outperform simpler ones, with the best-performing model achieving an accuracy of 96%, while other models range from 84% to 96%. Additionally, the proposed framework integrates both global and local explainability techniques, providing a standardized method for interpreting complex models. This approach enhances the understanding of decision-making processes, which is essential for improving stroke care and treatment. Finally, we suggest expanding the model to a web-based platform for stroke detection, extending its potential impact on public health.
Foggy weather presents substantial challenges for vehicle detection systems due to reduced visibility and the obscured appearance of objects. To overcome these challenges, a novel vehicle and Humans detection algorithm based on an improved lightweight YOLOv10 model is introduced. The proposed algorithm leverages advanced preprocessing techniques, including data transformations, Dehaze Formers, and dark channel methods, to improve image quality and visibility. These preprocessing steps effectively reduce the impact of haze and low contrast, enabling the model to focus on meaningful features. An enhanced attention module is incorporated into the architecture to improve feature prioritization by capturing long-range dependencies and contextual information. This ensures that the model emphasizes relevant spatial and channel features, crucial for detecting small or partially visible vehicles in foggy scenes. Furthermore, the feature extraction process has been optimized, integrating an advanced lightweight module that improves the balance between computational efficiency and detection performance. This research addresses critical issues in adverse weather conditions, providing a robust framework for foggy weather vehicle and Humans detection.
Massive jellyfish outbreaks pose serious threats to human safety and marine ecosystems, prompting the need for effective detection methods. This work focuses on utilizing optical imagery and CNN-based deep-learning object detection models for jellyfish identification. Due to the limited availability of labelled jellyfish datasets, we developed a novel dataset using a model-assisted labelling approach, significantly reducing the reliance on manual annotation. Building on this dataset, we propose an improved YOLOv11- CoordConv model, integrating advanced mechanisms such as Global Attention Mechanism (GAM) and modules to enhance its detection capabilities. Experimental evaluations demonstrate that the proposed model outperforms several state-of-the-art object detection frameworks, highlighting its potential as a reliable solution for jellyfish detection in underwater environments.
Massive jellyfish outbreaks pose serious threats to human safety and marine ecosystems, prompting the need for effective detection methods. This work focuses on utilizing optical imagery and CNN-based deep-learning object detection models for jellyfish identification. Due to the limited availability of labelled jellyfish datasets, we developed a novel dataset using a model-assisted labelling approach, significantly reducing the reliance on manual annotation. Building on this dataset, we propose an improved YOLOv11- CoordConv model, integrating advanced mechanisms such as Global Attention Mechanism (GAM) and modules to enhance its detection capabilities. Experimental evaluations demonstrate that the proposed model outperforms several state-of-the-art object detection frameworks, highlighting its potential as a reliable solution for jellyfish detection in underwater environments.
In this work, we propose a novel Dual-Mode Web-Based Image Processor designed to address the challenges of image translation across different modalities. Traditional computer vision models often rely on a single sensor modality, such as RGB or thermal images, but fail to fully exploit the complementary strengths of both. Our architecture leverages a single lightweight encoder that efficiently encodes both grayscale and thermal images into compact latent vectors. This encoding enables cross-modal image translation, including grayscale image colorization and thermal image reconstruction, facilitating flexibility in handling multiple downstream tasks. Our approach reduces the computational burden by utilizing a compact encoder and optimizing for both data compression and robust image translation across varied lighting conditions. The model employs four distinct generators and two discriminators in an adversarial framework, incorporating reconstruction error terms to ensure consistency and contrast preservation. Experimental results demonstrate competitive quality in translation and reconstruction across various lighting scenarios, with comprehensive evaluations across multiple metrics. Additionally, ablation studies validate the effectiveness of the proposed loss terms, confirming their role in improving model performance.
Marine debris poses a critical threat to environmental ecosystems, necessitating effective methods for its detection and localization. This study addresses the existing limitations in the literature by proposing an innovative approach that combines the instance detection capabilities of YOLOv11 with various attention mechanisms to enhance efficiency and broaden applicability. Coordinate attention and YOLOv11 demonstrate solid performance across various scenarios, while the bottleneck transformer, though slightly less consistent overall, proves valuable in identifying debris areas overlooked during manual annotation. Additionally, the bottleneck transformer shows enhanced precision in detecting larger debris pieces, indicating its potential utility in specific applications. This study underscores the versatility and efficiency of attention-enhanced YOLOv11 models for marine debris detection, demonstrating their ability to address diverse challenges in environmental monitoring. The results also emphasize the importance of aligning detection models with specific operational requirements, as unique characteristics of each model can offer advantages in targeted scenarios.
The efficient and accurate detection of foreign objects on railway tracks is critical to ensuring the safety and smooth operation of train systems. This work addresses the limitations of existing foreign object detection methods, including low efficiency and suboptimal accuracy, by proposing an enhanced railway foreign object intrusion detection framework leveraging YOLOv8 and Overhaul Knowledge Distillation. The proposed method consists of a two-stage architecture. In the first stage, a lightweight image classification network quickly determines whether a railway image contains foreign objects. This stage minimizes reliance on computationally intensive object detection models, thereby enhancing detection speed. In the second stage, YOLOv8 is employed to precisely detect and localize foreign objects in images flagged by the classification network. The choice of YOLOv8 provides notable improvements in accuracy and inference speed over previous versions such as YOLOv3. Additionally, the Overhaul Knowledge Distillation algorithm is applied to train the lightweight classification network under the supervision of a larger, more robust network, ensuring competitive classification performance while maintaining efficiency. Experimental results demonstrate that the proposed method achieves state-of-the-art performance in both detection accuracy and speed, with significant improvements in FPS and detection robustness compared to earlier approaches.
The detection and removal of Foreign Object Debris (FOD) on airport runways pose significant challenges due to small object sizes and the impact of complex weather conditions on visibility and equipment efficiency. To address these challenges, this paper proposes an advanced FOD detection model leveraging YOLOv8, alongside a geolocation prediction model based on machine learning regression algorithms.. By integrating a self-attention mechanism with Convolutional Neural Networks (CNNs), the proposed FOD detection system delivers impressive results. Ablation studies demonstrate notable improvements in the Mean Average Precision (mAP) across different model components. Comparative evaluations against existing models like YOLOv5, YOLOX, and YOLOv7 highlight the superior performance of YOLOv8, particularly in detecting small objects and maintaining accuracy with diverse input data. Furthermore, the geolocation prediction model, built using machine learning regression techniques, showcases significant potential for practical FOD detection and removal applications in real-world scenarios.
With the digital transformation and improvement of university library information technology, readers’ demands for library services are increasingly diversified and personalized. They are no longer satisfied with traditional borrowing services but hope that the library can provide more accurate and personalized recommendation services. To address these challenges, this study first proposes an improved item-based collaborative filtering recommendation algorithm based on the mean model representation. In addition, the system incorporates Neural Collaborative Filtering (NCF), which uses neural networks to model user-item interactions, providing more expressive and nonlinear representations compared to traditional methods, thereby enhancing recommendation quality. This algorithm is implemented using Django (4.1.2), a high-level web framework that promotes rapid development and clean design, ensuring a structured backend for the web application. The system leverages Pandas (1.5.0) for extensive data manipulation and analysis, allowing for effective handling of user data and preferences, while NumPy (1.23.3) facilitates numerical computations essential for the recommendation algorithms. For API integration, the system employs requests (2.28.1) and requests-oauthlib (1.3.1). The deployment is managed using gunicorn (20.1.0), a WSGI server that prepares the application for production environments, while virtualenv (20.16.5) and nodeenv (1.7.0) assist in managing Python and Node.js environments effectively.
Wearing safety helmets can effectively reduce the risk of head injuries for construction workers in high-altitude falls. In order to address the low detection accuracy of existing safety helmet detection algorithms for small targets and complex environments in various scenes, this study proposes an improved safety helmet detection algorithm based on YOLOv8, named YOLOv8n. For data augmentation, the mosaic data augmentation method is employed, which generates many tiny targets. In the backbone network, a coordinate attention (CA) mechanism is added to enhance the focus on safety helmet regions in complex backgrounds, suppress irrelevant feature interference, and improve detection accuracy. In the neck network, a slim-neck structure fuses features of different sizes extracted by the backbone network, reducing model complexity while maintaining accuracy. In the detection layer, a small target detection layer is added to enhance the algorithm’s learning ability for crowded small targets. Experimental results indicate that, through these algorithm improvements, the detection performance of the algorithm has been enhanced not only in general scenarios of real-world applicability but also in complex backgrounds and for small targets at long distances. Compared to the YOLOv8n algorithm, YOLOv8n in precision, recall, mAP50, and mAP50-95 metrics, respectively. Additionally, YOLOv8n-SLIM-CA reduces the model parameters by 6.98% and the computational load by 9.76%. It is capable of real-time and accurate detection of safety helmet wear. Comparison with other mainstream object detection algorithms validates the effectiveness and superiority of this method.
Pomegranates, revered for their nutritional richness and medicinal properties, are integral to global agriculture. Accurate identification of their growth stages is a pivotal step in modern farming, enabling optimized resource management, timely interventions, and the prevention of losses caused by pests, diseases, or environmental challenges. Traditionally, manual monitoring of crop growth stages is labor-intensive, prone to errors, and inefficient on a large scale. With advancements in deep learning and computer vision, automated solutions offer the potential to revolutionize crop management systems. This study presents a novel application of YOLOv10, an advanced object detection algorithm, for the precise identification of pomegranate growth stages. Leveraging YOLOv10’s ability to perform real-time detection and capture intricate spatial features, the proposed approach focuses on classifying five critical growth stages: Bud, Early-Fruit, Flower, Mid-growth, and Ripe. The model's architecture is tailored to ensure robust detection under diverse conditions, including variations in lighting, angles, and environmental settings. Data augmentation and hyperparameter tuning techniques are integrated into the training pipeline to further improve model generalization and reliability. By automating the growth stage detection process, this approach significantly reduces the reliance on manual labor and enhances the efficiency of agricultural operations. Farmers and agricultural stakeholders can use the insights generated by this system to implement precise interventions, maximize yield, and maintain consistent quality standards. The application of YOLOv10 in this domain represents a significant step forward in the adoption of artificial intelligence for sustainable agriculture, providing an efficient, scalable, and reliable solution for crop monitoring and management. This work highlights the transformative potential of integrating deep learning into agricultural practices, paving the way for improved productivity and reduced resource wastage in pomegranate farming.
The objective of this project is to enhance defect detection capabilities in the metal industry, particularly for identifying small and elongated defects on steel strips, which pose significant challenges for traditional detection methods. These defects are difficult to detect due to the small pixel percentage they occupy, as well as the repeated downsampling in convolutional networks, which can lead to the loss of minute features. To address these issues, we propose an advanced real-time defect detection network based on YOLOv10, specifically designed to overcome these challenges. The proposed system utilizes an efficient channel attention bottleneck (EB) module with 1D convolution to enhance feature extraction, focusing on small and elongated defects. Furthermore, the model incorporates Context Transformation Networks with cross-stage localized blocks (CC modules), which improve the understanding of semantic contextual information and the relationships between features. This methodology helps preserve critical defect details that might otherwise be missed. In addition, the model is trained on a self-constructed dataset tailored to include small and elongated defects. This dataset refinement allows for better feature fusion and extraction, ultimately improving the model's ability to detect and classify various defect types. The YOLOv10-based system is evaluated on several defect detection datasets, including GC10-DET, NEU-DET, and the self-constructed SLD-DET dataset, demonstrating enhanced performance and robustness for defect detection, particularly in industrial applications.
Traditional methods for evaluating fresh fruit bunches (FFBs) in palm oil production are inefficient, costly, and lack scalability. This study evaluates the performance of YOLOv10 and other state-of-the-art object detection models using a novel dataset of oil palm FFBs captured in plantations of Central Kalimantan Province, Indonesia. The dataset consists of five ripeness classes: abnormal, ripe, underripe, unripe, and flower, with challenges such as partially visible objects, low contrast, occlusions, small object sizes, and blurry images. YOLOv10 was compared with other models, including YOLOv6s, YOLOv7 Tiny, YOLOv8s, and Faster R-CNN. YOLOv10 demonstrated superior performance, with a compact size of 10.5 MB, fast inference time (0.022 seconds), and high detection accuracy, achieving mAP50 at 0.82 and mAP50-95 at 0.58. The model completed training in just 1 hour, 35 minutes, with low training loss, indicating efficient convergence. Additionally, YOLOv10 achieved low Mean Absolute Error (MAE) of 0.10 and Root Mean Square Error (RMSE) of 0.32, suggesting high precision in FFB counting. Hyperparameter tuning revealed that using the SGD optimizer, a batch size of 16, and a learning rate of 0.001 achieved optimal performance, balancing both accuracy and efficiency. Data augmentation techniques significantly enhanced model performance, improving accuracy across different ripeness classes. When evaluated against state-of-the-art models, including Faster R-CNN, SSD MobileNetV2, YOLOv4, and EfficientDet-D0, YOLOv10 outperformed these models in speed, accuracy, and efficiency, making it highly suitable for real-time applications in palm oil harvesting. This study demonstrates the potential of YOLOv10 for automating the evaluation of FFBs, improving both the efficiency and sustainability of palm oil production in large-scale plantations.
In the detection of bone fractures from X-ray images, accurate and timely diagnosis is crucial to prevent further complications and ensure proper healing. Existing models, such as YOLO NAS (You Only Look Once - Neural Architecture Search), have shown potential in object detection but have limitations in detecting subtle bone fractures, particularly small or hairline fractures. To address these shortcomings, the proposed model utilizes YOLO V8, an advanced version of the YOLO framework, which builds on previous models by offering improved accuracy, speed, and efficiency in real-time object detection tasks. YOLO V8 enhances the detection capabilities by refining the architecture and optimizing performance, making it better suited for medical image analysis. The model is trained on a comprehensive dataset of 1200 hand-bone X-ray images, classified into six distinct fracture categories. A comparison of the YOLO V8 model with YOLO NAS highlights the improved ability of V8 to detect complex and subtle fractures, ensuring faster and more reliable diagnoses. This advancement is essential for clinical settings, where delays or misdiagnoses could lead to severe outcomes for patients.
Coffee plants are susceptible to several diseases, including Brown Eye, Leaf Rust, Leaf Miner, and Red Spider Mite, which significantly impact both yield and quality. Early detection and timely intervention are vital for minimizing crop losses and improving coffee production. This paper introduces a novel approach using the YOLOv8 (You Only Look Once version 8) deep learning model for real-time detection and classification of these diseases on coffee leaves. YOLOv8 is an advanced, efficient object detection model known for its speed and accuracy, making it suitable for on-field deployment. It processes images of coffee leaves and provides fast, accurate localization of disease symptoms, classifying them into one of the four disease categories: Brown Eye, Leaf Rust, Leaf Miner, and Red Spider Mite. The model is trained on a custom dataset consisting of a wide variety of coffee leaf images, ensuring robust performance across different environmental conditions. YOLOv8's lightweight architecture allows for deployment on mobile devices and drones, enabling immediate disease identification. Experimental results show that YOLOv8 achieves high accuracy, precision, and recall, outperforming traditional models in terms of detection speed and robustness. The use of YOLOv8 for disease detection can aid farmers in quickly identifying and treating affected plants, ultimately improving coffee quality, reducing crop losses, and minimizing the reliance on harmful pesticides. The proposed approach offers a practical, scalable solution for disease monitoring in coffee plantations.
Inspection of structural cracks is critical for maintaining the safety and longevity of bridges and other infrastructure. Traditional methods for crack detection are often manual, labor-intensive, and prone to human error. Recent advances in deep learning and semantic segmentation provide a promising alternative, but obtaining high-quality annotated data remains a significant challenge. This paper introduces an enhanced approach to crack detection using deep learning, leveraging synthetic data generation and advanced semantic segmentation techniques. We propose the use of DeepLabV3 with a ResNet50 backbone, an extension of the DeepLabV3 architecture that incorporates a robust ResNet50 feature extractor to improve segmentation. Our approach involves generating synthetic crack images to address the data scarcity issue. This is achieved using the StyleGAN3 for realistic image synthesis. By integrating these synthetic datasets with the DeepLabV3+ model, we aim to boost segmentation performance beyond the capabilities of standard models. Hyperparameter tuning is performed to optimize the DeepLabV3 with ResNet50 configuration, achieving significant improvements in segmentation. We employ data augmentation techniques such as motion blur, zoom, and defocus to further refine model performance. The proposed method is evaluated against existing state-of-the-art techniques, demonstrating superior accuracy. The results indicate that our approach not only enhances the crack detection but also offers a novel application of synthetic data generation in deep learning for semantic segmentation. This research provides new insights into leveraging advanced neural networks and synthetic data for improved structural crack analysis.
Underwater imaging is often affected by light attenuation and scattering in water, leading to degraded visual quality, such as color distortion, reduced contrast, and noise. Existing underwater image enhancement (UIE) methods, such as Contrast Limited Adaptive Histogram Equalization (CLAHE), Dark Channel Prior (DCP), and Maximum Intensity Projection (MIP), have shown some success but often lack generalization capabilities, making them unable to adapt to various underwater images captured in different aquatic environments and lighting conditions. To address these challenges, a UIE method based on the conditional denoising diffusion probabilistic model (DDPM) is proposed, called DiffWater, which leverages the advantages of DDPM and trains a stable and well-converged model capable of generating high-quality and diverse samples. While methods like CLAHE improve contrast and DCP helps recover depth information by reducing haze, they may not handle all distortion issues in underwater imaging. Therefore, DiffWater introduces a color compensation method that performs channel-wise compensation in the RGB color space, tailored to different water conditions and lighting scenarios. This compensation guides the denoising process, ensuring high-quality restoration of degraded underwater images. Additionally, methods like the Rayleigh Distribution (RAY), Retinex-based Global and Local Image Enhancement (RGHS), and Unsharp Masking with Laplacian Pyramid (ULAP) have been explored to handle noise reduction, contrast enhancement, and edge sharpening, but these methods often struggle with varying lighting conditions and water environments. In DiffWater, the integration of such principles, combined with the conditional guidance provided by the degraded underwater image with color compensation, offers a more adaptive and robust approach. The experimental results show that DiffWater, when tested against existing methods including DCP, RGHS, and ULAP, on four real underwater image datasets, outperforms these comparison methods in terms of enhancement quality and effectiveness. DiffWater exhibits stronger generalization capabilities and robustness, addressing the complex visual distortions present in various underwater conditions more effectively than traditional algorithms like CLAHE, DCP, and MIP.
In this approach, we replace the traditional Otsu method for Field of View (FOV) segmentation in retinal images with a deep learning-based model utilizing U-Net architectures. The preprocessing phase begins by converting the retinal image to grayscale, specifically using the red channel, which offers better contrast for the fine vascular structures. A logarithmic transformation is applied to the grayscale image to further enhance the visibility of small features such as micro aneurysms and capillaries. This step prepares the image for more accurate segmentation by emphasizing details essential for the detection of diabetic retinopathy and other retinal abnormalities. The core of the segmentation process relies on U-Net, a Convolutional neural network designed for medical image segmentation. U-Net consists of a contracting path that captures high-level contextual features through successive Convolutional layers and down sampling operations. This is followed by an expanding path that progressively up samples the feature maps and concatenates them with corresponding layers from the contracting path, enabling precise localization of the FOV region. The final step in the U-Net architecture involves a 1x1 convolution layer that produces the binary mask of the FOV region, followed by a sigmoid activation function to output the probability map of the segmented area.
The increasing prevalence of thyroid cancer underscores the critical need for efficient classification and early detection of thyroid nodules. Automated systems can significantly aid physicians by expediting diagnostic processes. However, achieving this goal remains challenging due to limited medical image datasets and the complexity of feature extraction. This study addresses these challenges by emphasizing the extraction of meaningful features essential for tumor detection. The proposed approach integrates advanced techniques for feature extraction, enhancing the capability to classify thyroid nodules in ultrasound images. The classification framework includes distinguishing between benign and malignant nodules, as well as identifying specific suspicious classifications. The combined classifiers provide a comprehensive characterization of thyroid nodules, demonstrating promising accuracy in preliminary evaluations. These results mark a significant advancement in thyroid nodule classification methodologies. This research represents an innovative approach that could potentially offer valuable support in clinical settings, facilitating more rapid and accurate diagnosis of thyroid cancer.
In the domain of medical image classification, the Xception model stands out for its advanced performance in analyzing intricate image data. This research applies the Xception model to classify chest computed tomography (CT) images into four distinct categories: adenocarcinoma, large cell carcinoma, normal, and squamous cell carcinoma. Xception, renowned for its use of depthwise separable convolutions, enhances feature extraction by effectively capturing complex patterns with reduced computational cost. The study encompasses a rigorous evaluation of the Xception model through extensive training, validation, and testing phases on a specialized multi-class chest CT image dataset. The dataset includes a balanced representation of the four classes to ensure robust model performance across varying conditions. Key aspects of the evaluation include the model's accuracy in distinguishing between different types of carcinoma and normal tissues, as well as its efficiency in handling computational demands. The results demonstrate that the Xception model provides superior classification accuracy and reliable diagnostic performance. By leveraging its advanced architecture, the approach significantly improves the precision of medical image classification, offering valuable insights for enhanced diagnostic support in clinical settings. This work underscores the effectiveness of the Xception model in advancing medical imaging analysis and its potential impact on improving patient care through more accurate disease classification.
Malaria, a life-threatening disease transmitted by mosquitoes, remains a major public health challenge, claiming thousands of lives each year. Limited access to reliable detection tools, combined with challenges such as insufficient laboratory resources and inexperienced personnel, contribute to its high mortality rate. Recently, advancements in image analysis of malaria-infected red blood cells (RBCs) have provided promising alternatives for more accessible detection methods. By leveraging digital microscopy and innovative machine learning approaches, researchers aim to develop practical solutions that can improve diagnostic accuracy and accessibility. This approach not only enables a faster response in clinical settings but also highlights the potential for integration with IoT-enabled devices, facilitating wider deployment in resource-constrained regions. Such advancements underscore the potential of image-based malaria detection methods to enhance early diagnosis and treatment, especially in areas with limited medical resources.
To address the challenges of few-shot aerial image semantic segmentation, where unseen-category objects in query aerial images need to be parsed with only a few annotated support images, we propose a novel approach by integrating a U-Net architecture with Efficient Net. Typically, in few-shot segmentation, category prototypes are extracted from support samples to segment query images in a pixel-wise matching process. However, the arbitrary orientations and distribution of aerial objects in such images often result in significant feature variations, making conventional methods, which do not account for orientation changes, ineffective. The rotation sensitivity of aerial images causes substantial feature distortions, leading to low confidence scores and misclassification of same-category objects with different orientations. To overcome these limitations, we propose an enhanced solution by combining U-Net for robust semantic segmentation with Efficient Net for efficient feature extraction and scale adaptability. This architecture, which we refer to as Efficient U-Net, introduces rotation-invariant feature extraction to handle the varying orientations of aerial objects. By leveraging Efficient Net’s scalable Convolutional layers for feature extraction, we ensure that the network can capture orientation-varying yet category-consistent information from support images. This approach enhances the segmentation accuracy by aligning same-category objects, irrespective of their orientation, thereby minimizing the oscillation of confidence scores and improving the detection of rotated semantic objects. This Efficient U-Net model provides a scalable, rotation-invariant solution to the few-shot segmentation of aerial images.
Histotripsy, a focused ultrasound therapy, effectively ablates tissue by leveraging bubble clouds and has potential for treating conditions such as renal tumors. To enhance monitoring and evaluation, this study combines classification and segmentation techniques using deep learning models. A convolutional neural network (CNN) was employed for classification, distinguishing treated and untreated tissue regions, while the U-Net model was utilized for precise segmentation of ablation zones in ultrasound images. The U-Net architecture was fine-tuned using transfer learning to optimize segmentation accuracy. Ultrasound images of ablated red blood cell phantoms and ex vivo kidney tissues were used for training and testing, with digital photographs serving as ground truth. The performance of these models was compared to manual annotations by expert radiologists. The CNN achieved high accuracy in classifying tissue states, and the U-Net demonstrated robust segmentation, closely matching expert manual annotations. Segmentation performance improved with increased treatment exposure, achieving a Dice similarity coefficient exceeding 85% for 750+ applied pulses. Application of the U-Net to ex vivo kidney tissue revealed morphological shifts consistent with histology findings, confirming targeted tissue ablation. The integration of CNN-based classification and U-Net segmentation demonstrated significant potential for automating and enhancing the monitoring of histotripsy outcomes. This combined approach offers a reliable and efficient means of visualizing treatment progress, supporting real-time decision-making during therapeutic procedures. The study highlights the capability of deep learning models to automate and improve treatment monitoring in histotripsy, paving the way for real-time, data-driven interventions.
Automated detection of spinal lesions from Computed Topographies (CTs) can significantly enhance diagnostic efficiency and accuracy. This project implements a Computer-Aided Detection (CADe) system for spinal lesion analysis using the Xception model. The system is equipped with an intuitive web-based interface developed using the Flask framework, allowing physicians to seamlessly integrate it into their diagnostic workflow. The CADe system processes input CT scans by extracting vertebral regions of interest, converting them into 2D slices, and performing preprocessing to ensure optimal input quality for the Xception model. The model, fine-tuned with Transfer Learning, classifies vertebrae as either healthy conditions such as metastases, primary tumors, or sclerotic lesions. The training and testing datasets were created from CT scans of patients Records. Data augmentation techniques were applied to expand the dataset and improve model generalization. The Xception model achieved a high accuracy of and a recall of 92.99%, demonstrating its effectiveness in spinal lesion detection. This system aims to provide a reliable and efficient tool to assist medical professionals in spinal lesion diagnosis, enhancing clinical decision-making and patient outcomes.
Emotion recognition through facial expressions is a critical area of computer vision, enabling systems to understand human emotions for applications such as human-computer interaction, healthcare, and security. Recent advancements have led to the development of various deep learning models for emotion recognition. While transformers have shown promise, they often come with high computational costs, particularly in handling space-time attention mechanisms. To address this, we propose a novel approach for emotion recognition using a Convolutional Neural Network (CNN), which effectively extracts spatial features from facial images while being computationally efficient. Our CNN-based model is designed to focus on learning discriminative facial features that are crucial for recognizing a wide range of emotions. The model leverages a frame-wise deep learning architecture, allowing it to process each frame independently while capturing important facial patterns. We evaluate the performance of the proposed CNN-based model on benchmark dataset, Fer-2013plus (Facial-Emotion-Recognition), with geometric transformations used for data augmentation to address class imbalances. The results demonstrate that our CNN-based approach achieves competitive performance, either outperforming or matching the accuracy of techniques in emotion recognition. Furthermore, an ablation study on the challenging Fer2013+ dataset highlights the potential and effectiveness of the proposed model for handling complex emotion recognition tasks in real-world applications.
Face Segmentation is a critical research area in computer vision, with facial feature extraction playing a key role in improving accuracy. This paper focuses on the application of semantic segmentation methods for facial feature extraction. The structure and parameter count of the model significantly impact the performance of these tasks. To enhance accuracy and efficiency, we propose the use of the U-Net architecture for semantic segmentation in face recognition. The U-Net model is employed due to its ability to effectively capture spatial information through an encoder-decoder structure, which is crucial for precise segmentation of facial features. In our approach, we incorporate multi-scale feature extraction to balance between accuracy and the number of parameters, using large convolutional kernels for an expanded receptive field. Additionally, we use a channel attention mechanism to optimize feature aggregation from different depths, and depthwise separable convolution to reduce the computational burden. Our experimental results demonstrate that the proposed model, with fewer parameters, achieves high accuracy in semantic segmentation tasks for facial feature extraction.
This paper addresses the challenge of accurately classifying plant leaf diseases by proposing a novel deep learning approach based on Convolutional Neural Networks (CNN). Traditional CNN models often struggle to effectively capture the spatial and posture relationships of plant disease lesions, leading to issues with recognition accuracy and robustness. To overcome this limitation, we introduce an optimized CNN architecture designed specifically for plant leaf disease image classification. The proposed system enhances feature extraction by incorporating advanced convolutional layers that better capture the fine-grained details of leaf lesions. Additionally, a channel attention mechanism is integrated into the network to improve its focus on the most critical features associated with disease detection. To further improve performance, the architecture is designed to handle image transformations such as rotations, scaling, and flipping, ensuring the model's robustness across diverse real-world conditions. In addition to classification, the proposed approach also incorporates disease lesion detection using OpenCV. By utilizing OpenCV for image processing, such as drawing bounding boxes around given image, the model not only classifies the diseases but also locates them accurately the plant leaf images. This step enhances the interpretability of the model and provides more detailed information about the affected regions, which can be useful for precision agriculture applications. The model is trained and tested on multiple plant disease datasets, demonstrating significant improvements in classification accuracy, robustness, and generalization compared to traditional CNN models. The proposed method provides a reliable and efficient solution for automatic plant disease diagnosis, offering significant potential for agricultural applications and crop management.
Lesion area segmentation in endoscopic images plays a vital role in the early detection and diagnosis of diseases, aiding doctors in locating and identifying abnormal areas, which is crucial for improving patient outcomes. The U-Net model, known for its encoder-decoder architecture with skip connections, has achieved great success in medical image segmentation by capturing fine details and preserving spatial information. However, U-Net may face challenges in capturing global context information, which is essential for identifying larger and more complex lesions. In this paper, we propose an enhanced version of the U-Net model specifically for endoscopic image segmentation. The model is designed to improve the accuracy and precision of lesion boundary detection, ensuring better localization of abnormal areas in the images. To further enhance its performance, we integrate the OpenCV method, utilizing image preprocessing techniques such as noise reduction, contrast enhancement, and image normalization, which improve the model's robustness and efficiency in handling real-world medical data. The proposed method demonstrates promising segmentation performance, offering significant potential for improving clinical analysis, diagnosis, and decision-making in medical applications.
Skin Lesion segmentation remains the most prevalent form of cancer worldwide, and early detection significantly enhances the effectiveness of treatment. While deep learning techniques have substantially improved segmentation, challenges such as variability in lesion sizes, shapes, colors, and differences in contrast levels persist. This paper introduces a robust approach leveraging the SegNet architecture for precise skin lesion segmentation. SegNet, with its encoder-decoder structure, effectively preserves spatial information while capturing intricate lesion details, making it highly suitable for handling diverse lesion characteristics. To further enhance performance, OpenCV is employed for preprocessing tasks, including resizing, noise reduction, contrast enhancement, and augmentation techniques. These steps improve the model’s ability to handle real-world variability in skin lesion imaging. The proposed framework incorporates features from multiple data sources, including distinct color bands, grayscale images immune to illumination variations, and shading-reduced images, in combination with standard RGB channels. This fusion of features enables the model to address challenges related to shading and lighting inconsistencies. The results demonstrate the effectiveness of the SegNet-based approach in accurately delineating lesion boundaries, even in challenging cases with irregular shapes and varying contrast. This methodology highlights the potential of combining SegNet architecture with advanced preprocessing techniques to improve skin Lesion segmentation.
In today’s digital age, the rapid and widespread use of images across various platforms raises significant concerns about the security and confidentiality of visual data. Images often contain sensitive information, and their vulnerability to unauthorized access or tampering makes it imperative to adopt robust encryption methods. This study presents an innovative approach to securing digital images through the implementation of the Advanced Encryption Standard (AES) algorithm in Cipher Block Chaining (CBC) mode. The AES encryption algorithm is widely recognized for its efficiency and robustness in securing sensitive data, and by utilizing CBC mode, the security of the image data is further enhanced through block chaining, ensuring that each block of the encrypted data is dependent on the previous one. This provides better protection against common cryptographic attacks, making it more difficult for an unauthorized entity to access or alter the image.The proposed system is built using Flask, a lightweight web framework, offering users a seamless and user-friendly interface for image encryption and decryption. Through this system, users can easily upload an image, encrypt it with a secret key, and securely store or transmit the encrypted image. The decryption process is equally straightforward, allowing users to retrieve the original image using the correct key. Additionally, user authentication is integrated into the system with a registration and login mechanism, ensuring that only authorized individuals can access the encryption and decryption functionalities. The system is designed to handle a variety of image formats, providing flexibility and adaptability in real-world applications.This approach to image encryption combines ease of use with high levels of security, making it an ideal solution for anyone looking to protect their images from unauthorized access. By applying AES in CBC mode, the proposed method effectively addresses the growing need for secure image handling in a world where the security of digital data is of paramount importance. Furthermore, the solution’s adaptability and simplicity make it accessible to both technical and non-technical users, promoting widespread use in different industries where image security is crucial.
Single Infrared Image Super-Resolution (SISR) aims to enhance the spatial resolution of low-quality infrared images. This task is particularly challenging due to the inherent noise and limited information content in infrared images. To address these limitations, we propose a novel approach that leverages advanced deep learning techniques to effectively restore high-resolution details. Our method effectively captures and exploits the underlying structure of infrared images. By employing advanced feature extraction and reconstruction techniques, we are able to generate significantly improved image quality. Extensive experiments on various benchmark datasets demonstrate the superior performance of our proposed method in terms of both quantitative and qualitative metrics. An edge-point classification method using the radius of the shortest distance between the whale and the current global optimum in each iteration is presented to enhance a preliminary edge. The experimental results show that the proposed edge detection method has the advantages of strong denoising, fast speed, and good quality.
This presents a hybrid approach to gunshot sound detection by integrating Mel-Frequency Cepstral Coefficients (MFCC), Support Vector Machines (SVM), and YAMNet, a pre-trained deep learning model. The process begins with the extraction of MFCC features from audio data, which capture the essential characteristics of the sound spectrum. These features are then used to train an SVM model to classify sounds as gunshots or non-gunshots. To enhance detection accuracy, YAMNet is employed to classify the audio into a broader range of categories, providing an additional layer of validation or complementing the SVM's predictions. The combination of SVM's precision with YAMNet's extensive sound classification capabilities results in a robust system capable of accurately identifying gunshot sounds in real-time audio streams. This hybrid approach leverages both traditional machine learning and state-of-the-art deep learning techniques, offering a reliable solution for gunshot detection in various applications.
This study presents an advanced approach to detecting lung auscultation sounds using Mel-frequency Cepstral Coefficients (MFCC), Chroma features, and neural networks. Lung auscultation, a key diagnostic tool in identifying respiratory conditions, often relies on the expertise of medical professionals to interpret subtle sound patterns. However, automated systems that accurately classify these sounds can greatly assist in early diagnosis and treatment. To achieve this, we employed MFCC, which captures the power spectrum of sounds and effectively models the way humans perceive auditory signals, focusing on the critical frequency ranges for lung sounds. Additionally, Chroma features, which represent the tonal content of audio signals, were used to capture harmonic aspects that could be indicative of specific lung conditions. These features were then fed into a neural network designed to classify lung sounds into various diagnostic categories, such as normal breathing, wheezing, crackles, and other abnormal respiratory sounds. The neural network, trained on a comprehensive dataset of lung sounds, was able to learn complex patterns and correlations within the MFCC and Chroma features, leading to high accuracy in sound classification. This automated approach offers a powerful tool for enhancing the precision of lung sound diagnosis, potentially leading to earlier detection of respiratory conditions and improved patient outcomes.
Information and Communication Technologies have propelled social networking and communication, but cyber bullying poses significant challenges. Existing user-dependent mechanisms for reporting and blocking cyber bullying are manual and inefficient. Conventional Machine Learning and Transfer Learning approaches were explored for automatic cyber bullying detection. The study utilized a comprehensive dataset and structured annotation process. Textual, sentiment and emotional, static and contextual word embeddings, psycholinguistics, term lists, and toxicity features were employed in the Conventional Machine Learning approach. This research introduced the use of toxicity features for cyber bullying detection. Contextual embeddings of word Convolutional Neural Network (Word CNN) demonstrated comparable performance, with embeddings chosen for its higher F-measure. Textual features, embeddings, and toxicity features set new benchmarks when fed individually. The model achieved a boosted F-measure combining textual, sentiment, embeddings, psycholinguistics, and toxicity features in a Logistic Regression model. This outperformed Linear SVC in terms of training time and handling high-dimensionality features. Transfer Learning utilized Word CNN for fine-tuning, achieving a faster training computation compared to the base models. Additionally, cyber bullying detection through Flask web was implemented, yielding an accuracy. The reference to the specific dataset name was omitted for privacy.
Postpartum depression (PPD) is a widespread mental health disorder impacting new mothers worldwide, arising from a complex interplay of emotional, social, and physiological changes following childbirth. Early detection is crucial, as timely intervention can significantly improve maternal and child well-being. In this study, we propose a hyperparameter-optimized XGBoost classifier aimed at accurately predicting PPD risk based on responses to a standardized questionnaire. Our research uses a dataset of 1,503 participants collected from a medical institution through a digital survey platform (Google Forms), capturing key demographic, social, and health-related factors.Our approach applies extensive hyperparameter tuning to optimize the XGBoost classifier's performance, which we then benchmarked against ten alternative machine learning models to determine its efficacy. The XGBoost classifier, when optimized, demonstrated a substantial accuracy increase, making it a strong predictive tool for clinical applications. To validate its robustness, we employed k-fold cross-validation, which confirmed the model's reliability and consistency. This study underscores the importance of specific risk factors in PPD onset, positioning our optimized XGBoost model as an efficient predictive solution in maternal healthcare for early PPD risk assessment and prevention planning.
Most companies nowadays are using digital platforms for the recruitment of new employees to make the hiring process easier. The rapid increase in the use of online platforms for job posting has resulted in fraudulent advertising. Scammers exploit these platforms to make money through fraudulent job postings, making online recruitment fraud a critical issue in cybercrime. Therefore, detecting fake job postings is essential to mitigate online job scams. Traditional machine learning and deep learning algorithms have been widely used in recent studies to detect fraudulent job postings. This research focuses on employing Long Short-Term Memory (LSTM) networks to address this issue effectively. A novel dataset of fake job postings is proposed, created by combining job postings from three different sources. Existing benchmark datasets are outdated and limited in scope, restricting the effectiveness of existing models. To overcome this limitation, the proposed dataset includes the latest job postings. Exploratory Data Analysis (EDA) highlights the class imbalance problem in detecting fake jobs, which can cause the model to underperform on minority classes. To address this, the study implements ten top-performing Synthetic Minority Oversampling Technique (SMOTE) variants. The performances of the models, balanced by each SMOTE variant, are analyzed and compared. Among the approaches implemented, the LSTM model achieved a remarkable accuracy of 97%, demonstrating its superior performance in detecting fake job postings.
Addressing the intricate challenge of fake news detection, traditionally reliant on the expertise of professional fact-checkers due to the inherent uncertainty in fact-checking processes, this research leverages advancements in language models to propose a novel Long Short-Term Memory (LSTM)-based network. The proposed model is specifically tailored to navigate the uncertainty inherent in the fake news detection task, utilizing LSTM's capability to capture long-range dependencies in textual data. The evaluation is conducted on the well-established LIAR dataset, a prominent benchmark for fake news detection research, yielding an impressive accuracy of 99%. Moreover, recognizing the limitations of the LIAR dataset, we introduce LIAR2 as a new benchmark, incorporating valuable insights from the academic community. Our study presents detailed comparisons and ablation experiments on both LIAR and LIAR2 datasets, establishing our results as the baseline for LIAR2. The proposed approach aims to enhance our understanding of dataset characteristics, contributing to refining and improving fake news detection methodologies by effectively leveraging the strengths of LSTM architecture.
In today’s world, personal safety in environments such as remote or isolated areas, where individuals may be working alone, has become a critical concern. Threats such as robbery, assault, and other criminal activities are often accompanied by specific sounds, which can serve as early indicators of potential danger. While traditional security systems are available, they often fail to detect or classify these sounds with the necessary accuracy or in real-time. This project aims to address this challenge by developing a system that classifies different types of surrounding audio events, allowing for a deeper understanding of the environment in real-time. The focus of this project is on accurately detecting and classifying various audio signals, which may include common environmental sounds such as footsteps, vehicle noise, or background chatter. By applying a deep learning model, specifically a 1D Convolutional Neural Network (CNN), the system processes audio data from real-world environments to classify these sounds into distinct categories. The 1D-CNN model is well-suited for this task, as it can effectively capture time-dependent features from the audio signals. The model is trained using a dataset of labeled audio events, where each audio clip is associated with a specific sound category. The deep learning model analyzes these signals, extracting key features that help distinguish between different audio events. This approach offers a powerful tool for understanding environmental audio in various settings, such as urban areas, workplaces, or isolated locations. By focusing on real-time sound classification, this project contributes to improving situational awareness and providing a foundation for further advancements in sound-based monitoring and analysis systems.
Detecting suicidal ideation through social media content is a critical initiative to support mental health intervention strategies. This study presents an explainable framework that leverages advanced Natural Language Processing (NLP) techniques to address the challenges of identifying suicidal intent in user-generated content. A significant innovation in this work is the creation of synthetic datasets informed by psychological and social factors associated with suicidal ideation, designed to supplement limited real-world data while maintaining ethical considerations. The proposed system classifies social media content into two categories: Non-Suicidal or Suicidal. The hybrid approach of combining synthetic and real-world data enhances model performance, achieving superior accuracy and robustness compared to traditional methods. The framework emphasizes explainability by incorporating techniques that identify key linguistic and contextual features driving model predictions, ensuring interpretability for mental health professionals and researchers. This approach underscores the potential of integrating synthetic data and NLP in addressing real-world challenges such as data scarcity, diversity, and ethical concerns. By providing actionable insights and ensuring transparency, the proposed framework contributes to building reliable and scalable solutions for suicide prevention in digital environments.
Climate change remains one of the most pressing global challenges, and understanding public sentiment surrounding this issue is critical for shaping effective policy and response strategies. Social media platforms, particularly Twitter, have become key venues for individuals to express their opinions, concerns, and reactions to climate change-related topics. To capture and analyze this sentiment, this study employs Natural Language Processing (NLP) techniques combined with Long Short-Term Memory (LSTM) networks, a type of recurrent neural network, to analyze tweets related to climate change. The LSTM model, renowned for its ability to capture long-range dependencies in text data, is utilized to classify sentiment and extract meaningful insights from the discourse. By applying advanced NLP techniques and deep learning, this study aims to provide a comprehensive understanding of public sentiment on climate change, enabling stakeholders, including policymakers and environmental organizations, to better grasp public perceptions and inform strategies to tackle the climate crisis.
Depression is a prevalent mental health condition that significantly impacts individuals’ lives, and its timely detection is crucial for effective intervention. Traditional machine learning approaches often struggle due to the limitations in annotated data and the lack of transparency in model predictions. This study aims to address these challenges by employing advanced natural language processing (NLP) techniques and deep learning algorithms, specifically Long Short-Term Memory (LSTM) networks, to develop an explainable model for depression detection in social media content. The primary objective is to classify social media text into two categories: depression and control, based on linguistic patterns indicative of depressive symptoms. The model leverages LSTM to capture the sequential dependencies in text, making it capable of identifying nuanced patterns that distinguish between depression-related and non-depression-related content. Additionally, the study incorporates interpretability methods such as attention mechanisms to provide insights into the features influencing the model's predictions, thus ensuring transparency and trust in the decision-making process. The proposed model is evaluated using a publicly available Mental Health dataset, which contains labeled social media posts. The results demonstrate the effectiveness of LSTM in classifying text into depression and control categories, contributing to the field of mental health by offering a scalable and interpretable approach for early depression detection. This research has the potential to assist mental health professionals by enabling the automated identification of depression in social media content, facilitating timely intervention and improving overall well-being.
Sarcasm detection presents unique challenges due to the complex linguistic and contextual nature of sarcastic expressions. Understanding sarcasm in text requires advanced methods capable of capturing nuanced patterns that are not easily detectable by traditional approaches. In this study, we propose the use of Convolutional Neural Networks (CNN) for sarcasm detection. The CNN model is designed to identify and classify sarcastic content by learning intricate patterns from the text. Our approach demonstrates that CNNs, with their ability to effectively capture spatial hierarchies and context in textual data, offer superior performance in detecting sarcasm compared to simpler methods. The study also emphasizes the importance of sophisticated data augmentation techniques to address issues like data imbalance, further enhancing the model’s effectiveness. This work contributes to advancing sarcasm detection, providing valuable insights for applications in sentiment analysis and natural language understanding.
The rapid growth of mobile applications and online e-commerce platforms has made it increasingly easy to gather large amounts of data, providing valuable insights into consumer behavior. Analyzing user reviews has become essential in assisting users with purchasing decisions. In the proposed system, we introduce a solution by combining NLP (Natural Language Processing) techniques with a CNN (Convolutional Neural Network) model for review classification. The model incorporates text preprocessing, tokenization, and word embedding techniques to better understand the nuances of review content. The CNN-based architecture enhances the ability to detect meaningful patterns and relationships in the data, significantly improving prediction accuracy and computational efficiency. This approach overcomes the limitations of previous methods by providing a more accurate and scalable model for review analysis. It can be easily adapted to handle large-scale datasets and diverse textual data. Through experimental evaluation, the proposed system demonstrates superior performance, showing better classification results compared to existing approaches. By focusing on key patterns and relationships within the text data, the system offers an efficient and effective solution for predicting helpful reviews and enhancing decision-making confidence in e-commerce platforms.
In our proposed approach, BERT is used to capture the contextual meaning of the user feedback, significantly improving the model's ability to understand subtle details and intricacies in the textual data. The output of BERT is then passed through an LSTM layer, which is capable of capturing sequential dependencies in the data, making it ideal for analyzing user feedback over time and identifying patterns in emotional expressions.This model aims to classify the emotional tone of user feedback into five categories: Angry, Sad, Fear, Surprise, and Happy. By processing the crowd-user feedback from low-ranked software applications, we can identify prevalent issues while also classifying the emotional reactions to those issues. This allows software developers to prioritize bug fixes based on both the frequency of the issues and the intensity of user emotions.Our results demonstrate an accuracy of 92%, outperforming traditional ML algorithms like MNB, LR, RF, and MLP. The improved accuracy is attributed to the combination of BERT's deep contextual understanding and LSTM's ability to model sequential dependencies. Moreover, the approach provides a powerful tool for software vendors to not only identify critical issues in their applications but also gain insights into the emotional impact of these issues on users. This will enable software vendors to take more informed actions in improving their products, enhancing user satisfaction, and prioritizing fixes in a timely manner.
To achieve quality education, a key goal of sustainable development, it is essential to provide stakeholders with accurate and relevant information about educational institutions. Prospective students often face challenges in obtaining consistent and reliable information about universities or institutes, especially regarding unique courses and opportunities. These inconsistencies, stemming from sources such as websites, rankings, and brochures, can lead to confusion and influence decision-making. A robust solution to address this challenge is the implementation of a chatbot application on the university's official website. A chatbot, powered by artificial intelligence, can simulate human-like conversations and respond promptly to student inquiries. By leveraging Natural Language Processing (NLP) techniques, a chatbot can provide predefined, accurate, and uniform information 24/7, making it a valuable tool for the counseling process. In this research, a chatbot was developed using NLP concepts, specifically the NLTK library, and trained using neural networks to achieve exceptional performance. The system processed and structured user queries by creating an intents.json file, tokenizing and lemmatizing input text, and converting data into a bag-of-words representation. The neural network, optimized using advanced techniques, achieved an impressive accuracy of 99%. This approach demonstrated the effectiveness of sequential models, which prevent overfitting and excel in handling contextual queries. Additionally, the chatbot incorporated pattern matching and semantic analysis to enhance real-time query resolution. By integrating advanced NLP methods and neural networks, this research provides a robust and scalable chatbot solution, offering precise, consistent, and accessible information to prospective students, ultimately aiding them in making well-informed academic decisions.
Brain hemorrhage refers to a potentially fatal medical disorder that affects millions of individuals. The percentage of patients who survive can be significantly raised with the prompt identification of brain hemorrhages, due to image-guided radiography, which has emerged as the predominant treatment modality in clinical practice. A Computed Tomography Image has frequently been employed for the purpose of identifying and diagnosing neurological disorders. The manual identification of anomalies in the brain region from the Computed Tomography Image demands the radiologist to devote a greater amount of time and dedication. In the most recent studies, a variety of techniques rooted in Deep learning and traditional Machine Learning have been introduced with the purpose of promptly and reliably detecting and classifying brain hemorrhage. This overview provides a comprehensive analysis of the surveys that have been conducted by utilizing Machine Learning and Deep Learning. This research focuses on the main stages of brain hemorrhage, which involve preprocessing, feature extraction, and classification, as well as their findings and limitations. Moreover, this in-depth analysis provides a description of the existing benchmark datasets that are utilized for the analysis of the detection process. A detailed comparison of performances is analyzed. Moreover, this paper addresses some aspects of the above-mentioned technique and provides insights into prospective possibilities for future research.
Impaired renal function poses a risk across all age groups, with kidney diseases often progressing without noticeable symptoms until they reach an advanced stage. Given the global shortage of nephrologists and the growing public health concern over renal failure, there is an urgent need for an AI-driven system capable of autonomously detecting kidney abnormalities. This project addresses that need by developing a Convolutional Neural Network (CNN)-based model to detect kidney diseases, specifically targeting cysts, stones, and tumours—common causes of renal failure. A comprehensive dataset of 12,446 CT whole abdomen and program images was collected and annotated, covering four categories: cyst, tumor, stone, and normal. The CNN model achieved a notable accuracy of 97%, outperforming previous YOLOv8-based models. Key evaluation metrics, including precision, recall, F1 score, and specificity, demonstrate the model’s ability to reliably differentiate among kidney abnormalities. This system provides a customizable and effective platform for the clinical diagnosis of renal conditions, potentially enhancing diagnostic accuracy and accessibility in healthcare settings, particularly in regions with limited access to specialists. The results underscore the potential of deep learning in medical imaging, offering a promising solution to aid in early detection and treatment of kidney diseases.
Developing a robust and efficient system for identifying medicinal plants using a combination of deep learning (DL) and traditional machine learning (ML) techniques. Medicinal plants have been integral to healthcare for centuries, providing essential ingredients for drug development and medical treatments. While over 25% of medicines in developed countries are derived from these plants, approximately 80% of individuals in developing countries rely on them for primary healthcare. Traditionally, the identification of these plants is performed manually by experts, a process that is tedious, time-consuming, and often subjective, heavily reliant on the availability of trained personnel. Incorrect identification can lead to severe health consequences, emphasizing the need for a more reliable and efficient identification method. In light of this challenge, your project presents an innovative solution that automates the identification of medicinal plants using images captured by smartphones in their natural environments. The proposed system leverages a cascaded architecture combining deep learning and traditional machine learning techniques. The core of the system utilizes a pre-trained Xception model for feature extraction, capturing intricate features from plant images and benefiting from knowledge gained from extensive training on large datasets. These extracted features are then optimized using, enhancing their quality for classification. A (random forest) is employed to classify the medicinal plants based on the optimized features, excelling in high-dimensional data scenarios. This rapid identification capability, combined with high accuracy and robustness, underscores the system’s practical applicability for users who need reliable plant identification on the go. By integrating deep learning for feature extraction and traditional machine learning for classification, your project addresses the critical need for efficient and accurate medicinal plant identification, enhancing the reliability of plant identification while providing a practical solution that can be easily used in real-world scenarios, particularly in settings where expert knowledge is limited.
Deep learning has become one of remote sensing scientists’ most efficient computer vision tools in recent years. However, the lack of training labels for the remote sensing datasets means that scientists need to solve the domain adaptation (DA) problem to narrow the discrepancy between satellite image datasets. As a result, image segmentation models that are then trained, could better generalize and use an existing set of labels instead of requiring new ones. This work proposes an unsupervised DA model that preserves semantic consistency and per-pixel quality for the images during the style-transferring phase. This article’s major contribution is proposing the improved architecture of the SemI2I model, which significantly boosts the proposed model’s performance and makes it competitive with the state-of-the-art CyCADA model. A second contribution is testing the CyCADA model on the remote sensing multiband datasets, such as WorldView-2 and SPOT-6. The proposed model preserves semantic consistency and per-pixel quality for the images during the style-transferring phase. Thus, the semantic segmentation model, trained on the adapted images, shows substantial performance gain compared to the SemI2I model and reaches similar results as the state-of-the-art CyCADA model. The future development of the proposed method could include ecological domain transfer, a priori evaluation of dataset quality in terms of data distribution, or exploration of the inner architecture of the DA model.
Cerebrovascular diseases such as stroke are among the most common causes of death and disability worldwide and are preventable and treatable. Early detection of strokes and their rapid intervention play an important role in reducing the burden of disease and improving clinical outcomes. In recent years, machine learning methods have attracted a lot of attention as they can be used to detect strokes. The aim of this study is to identify reliable methods, algorithms, and features that help medical professionals make informed decisions about stroke treatment and prevention. To achieve this goal, we have developed an early stroke detection system based on CT images of the brain, utilizing a ResNet (Residual Network) model to detect strokes at a very early stage. For image classification, the ResNet model is employed to extract the most relevant features for classification. Cross-validation was used to evaluate the system's effectiveness, employing metrics such as precision, recall, F1 score, ROC (Receiver Operating Characteristic Curve), and AUC (Area Under the Curve). The proposed diagnostic system allows physicians to make informed decisions about stroke treatment.
We aim to advance deep learning-based smart agriculture by implementing semantic segmentation on crop images captured in real field environments. Our primary objective is to accurately detect diseases, thereby facilitating the automation of agricultural management processes. A significant challenge in this task is the small size of disease regions, which serve as the Regions of Interest (RoI) and make precise prediction difficult. Previously, the RoI-Attention Network (RA-Net) was used to address this challenge by utilizing an RoI-attentive image, focusing on regions predicted as diseased and their surroundings to enhance the network's ability to detect these small regions. However, we now propose using the U-Net architecture as an improvement over RA-Net. U-Net, with its symmetric design and skip connections, is expected to better capture the context and details needed for accurately segmenting small disease regions. By leveraging U-Net's ability to integrate both high-level and low-level features, we aim to refine the precision of disease detection, enhancing the overall effectiveness of automated agricultural management systems.
Down syndrome is a chromosomal condition resulting from an additional copy of chromosome 21, leading to various developmental challenges and distinctive physical characteristics. Children with Down syndrome typically exhibit unique craniofacial features, including a shorter midface, broader facial width, flat nasal bridge, almond-shaped eyes, and a smaller, somewhat flattened head. These identifiable traits can significantly aid in early diagnosis and intervention. This study focuses on the early diagnosis of Down syndrome using an advanced neural network approach based on ResNet50. We utilized a dataset of 3,009 facial images of children aged 0 to 15, comprising both those with Down syndrome and healthy children, to conduct our experiments. Our proposed method, ResNet50-DNSNet, leverages the deep residual learning framework of ResNet50 for robust feature extraction, capturing intricate spatial features from the input images. By fine-tuning the pre-trained ResNet50 model, we extracted high-level features that enhance the model's ability to distinguish between facial characteristics of children with Down syndrome and those without. We evaluated the performance of our approach using several artificial intelligence techniques, including logistic regression, support vector machines, and gradient boosting methods. Extensive experimental results demonstrated that our ResNet50-DNSNet achieved an impressive accuracy of 0.99, surpassing state-of-the-art methods. The model's performance was rigorously validated through k-fold cross-validation, ensuring reliability and robustness. Additionally, we assessed the runtime computational complexity of our proposed approach. This innovative research has the potential to transform the early diagnosis of Down syndrome in children through the analysis of facial images, facilitating timely intervention and support.
"A Large Dataset to Enhance Skin Cancer Classification with Transformer-Based Deep Neural Networks" reflects a research approach aimed at improving the accuracy of skin cancer diagnosis by utilizing cutting-edge deep learning techniques, specifically Transformer models, on a large dataset. Using a substantial dataset related to skin cancer plays a crucial role in this research, as larger datasets provide more diverse examples that help the model generalize better to unseen data. This enhances the model’s ability to recognize patterns across various types of skin conditions, including Melanoma, Melanocytic Nevi, Basal Cell Carcinoma, Actinic Keratoses, Benign Keratosis-like Lesions, Dermatofibroma, and Vascular Lesions. Transformer-based deep neural networks are applied here, leveraging self-attention mechanisms that allow the model to analyse and capture complex relationships within image data. This self-attention mechanism works by weighing the importance of each part of an image in relation to others, effectively capturing spatial dependencies across image regions. This allows the model to focus on relevant features in diverse and detailed skin images, making Transformer-based architectures a promising technique for medical image analysis.
With the rapid growth of urbanization and industrialization, the challenge of managing increasing volumes of waste has become critical. Effective waste sorting and recycling are essential to reducing environmental impact and promoting sustainability. Deep learning has proven to be an effective tool in automating complex tasks such as image classification, making it ideal for waste categorization applications. This study introduces a deep learning model based on the NASNet architecture for the classification of recyclable waste into six distinct categories: cardboard, glass, metal, paper, plastic, and litter. By utilizing the NASNetLarge base model, the proposed model leverages a pre-trained, highly efficient architecture capable of extracting complex and hierarchical features from waste images. Custom layers, including global average pooling, dropout for regularization, and fully connected dense layers, are added to enhance the model’s ability to learn discriminative features and prevent overfitting. The model’s performance is optimized using the Adam optimizer with categorical cross-entropy loss, ensuring reliable and robust classification even in challenging conditions such as varying image quality and background noise. To improve model interpretability, we employ Score-CAM saliency maps, which provide visual explanations for the model’s decision-making process, allowing users to understand which parts of the image influenced its predictions. This interpretability aspect is vital for building trust in automated systems, particularly in real-world applications like waste sorting. The proposed NASNet-based model demonstrates superior performance compared to existing waste classification methods and offers a promising solution for automating waste management processes. It has the potential to significantly enhance the efficiency of recycling systems, reduce human error, and contribute to sustainable environmental practices.
Recent advances in synthetic image generation, particularly through artificial intelligence, have led to the creation of images so realistic that they are virtually indistinguishable from real photographs. This presents significant challenges for data authenticity and reliability, especially in areas such as journalism, social media, and scientific research, where the integrity of images is critical. This study proposes an approach to effectively distinguish between real and AI-generated images using a deep learning model based on ResNet50. The classification task is framed as a binary problem, where images are categorized as either "real" or "AI-generated." While synthetic images can replicate complex visual details such as lighting, reflections, and textures, subtle visual imperfections often differentiate them from genuine photographs. The study investigates these differences, focusing on minor artifacts and inconsistencies that are typically present in AI-generated content, such as background distortions, lighting anomalies, and unnatural textures. These artifacts are not always perceptible to the human eye, but can be reliably detected by machine learning models. The ResNet50 model is employed to learn and classify these visual cues, enabling the system to achieve high accuracy in distinguishing real images from synthetic ones. By training on a large dataset of both real and AI-generated images, the model identifies key image features that serve as indicators of authenticity. The study also explores the interpretability of the model's decisions, shedding light on which aspects of the images are most informative for classification.
Paddy leaf diseases are a significant concern for rice farmers globally, as they can lead to severe reductions in crop yield and quality, threatening food security and economic stability in rice-dependent regions. Traditional methods of disease detection rely on manual inspection by experts, which is not only time-consuming and costly but also infeasible for large-scale farms. Moreover, such methods are prone to human error and inconsistencies, particularly in early stages when visual symptoms may be subtle. Addressing this critical issue, our study proposes an automated, high-accuracy classification approach for paddy leaf disease detection using an Xception-based deep learning model. The Xception model, known for its depthwise separable convolution architecture, is fine-tuned and adapted to capture the nuanced patterns of various paddy diseases, such as bacterial blight, brown spot, and leaf smut. Initialized with ImageNet weights, the model is further refined with custom layers to enhance its specificity and robustness in recognizing complex disease features specific to paddy leaves. This customization ensures that the model is highly sensitive to slight variations in leaf texture, color, and shape caused by different pathogens, allowing it to classify multiple disease types effectively. To optimize model performance, we employ the Adam optimizer and categorical cross-entropy loss function, which helps achieve smooth and reliable convergence. The proposed model not only serves as a tool for accurate disease classification but also supports early detection, which is crucial for timely intervention and preventing disease spread. By enabling real-time disease monitoring, this approach aids farmers in applying precise treatment, minimizing the excessive use of pesticides, and reducing environmental impact. Furthermore, this method can be integrated into mobile or IoT-based solutions, making it accessible for rural and resource-limited settings. Ultimately, our work contributes to sustainable agriculture by empowering farmers with a scalable, low-cost solution for disease management, safeguarding crop yields, and promoting food security.
We research how deep learning convolutional neural networks (CNN) can be used to automatically classify the unique data of naval ships images from the dataset collection. We investigate the impact of data preprocessing and externally obtained images on model performance and propose the Xception algorithm as an enhancement to our existing CNN approach. Additionally, we explore how the models can be made transparent using visually appealing interpretability techniques. Our findings demonstrate that the Xception algorithm significantly improves classification performance compared to the traditional CNN approach. The results highlight the importance of appropriate image preprocessing, with image combined with soft augmentation contributing notably to model performance. This research is original in several aspects, notably the uniqueness of the acquired dataset and the analytical modeling pipeline, which includes comprehensive data preprocessing steps and the use of deep learning techniques. Furthermore, the research employs explanatory tools like Xception to enhance model interpretability and usability. We believe the proposed methodology offers significant potential for documenting historic image collections.
The demand for high-quality tomatoes to meet consumer and market standards, combined with large-scale production, has necessitated the development of an inline quality grading system. Manual grading is time-consuming, costly, and labor-intensive. This study introduces a novel approach for tomato quality sorting and grading, leveraging pre-trained convolutional neural networks (CNNs) for feature extraction and traditional machine learning algorithms for classification in a hybrid model. Image preprocessing and fine-tuning techniques were applied to enable deep layers to learn and concentrate on complex and significant features. In our existing approach, features extracted by CNNs were classified using support vector machines (SVM), achieving notable accuracy rates. Specifically, the CNN-SVM model attained the best accuracy in the binary classification of tomatoes as healthy or rejected, and in the multiclass classification of tomatoes as ripe, unripe, or rejected. In this study, we propose an enhanced algorithm using the Xception model for feature extraction. The Xception-based approach aims to further improve classification accuracy and performance metrics. The performance of the proposed Xception model was evaluated using metrics such as accuracy, recall, precision, specificity, and F1-score, demonstrating its potential to outperform the existing CNN-SVM ensemble model. This methodology offers significant advancements in the automated grading and sorting of tomatoes, ensuring higher efficiency and consistency in quality assessment.
Accurately classifying white blood cell subtypes is essential for diagnosing various blood diseases. Traditional methods in computer vision often require manually engineered features, which are time-consuming and can limit performance. In contrast, machine learning approaches offer improved accuracy but typically demand extensive labeled datasets, which are challenging and costly to obtain. This study introduces a semi-supervised learning approach tailored for white blood cell classification. By leveraging a combination of a small amount of labeled data and a larger set of unlabeled data, the model learns to identify and categorize different white blood cell subtypes directly from microscopic images. This methodology capitalizes on the inherent structure and patterns present in the data, enhancing classification performance without relying solely on predefined features. The proposed approach was evaluated using a dataset comprising synthetic images representing various white blood cell subtypes. Results demonstrate promising accuracy in distinguishing between different cell types, showcasing potential applications in clinical diagnostics. By minimizing the reliance on manually labeled data while maintaining high classification accuracy, this approach offers a scalable solution for automating and improving the efficiency of white blood cell analysis in medical settings.
Otoscopy is a diagnostic procedure to visualize the external ear canal and eardrum, facilitating the detection of various ear pathologies and conditions. Timely otoscopy image classification offers significant advantages, including early detection, reduced patient anxiety, and personalized treatment plans. This paper introduces a novel framework specifically tailored for otoscopy image classification. It leverages octave 3D convolution and a combination of feature and region-focus modules to create an accurate and robust classification system capable of distinguishing between various otoscopic conditions. This architecture is designed to efficiently capture and process the spatial and feature information present in otoscopy images. Using a public otoscopy dataset, reached a classification and an F1 score across 11 classes of ear conditions. A comparative analysis demonstrates that surpasses other established machine learning model, including Xception Model, across various evaluation metrics. The research’s contribution to improved diagnostic accuracy reduced human error, expedited diagnostics, and its potential for telemedicine applications.
In recent years, jute has become a crucial natural fiber crop, facing increasing threats from insect pests that can significantly undermine agricultural productivity. The challenge of accurately identifying these pests is further complicated by factors such as complex backgrounds, fuzzy features, and the presence of multiple small targets, which make detection difficult for conventional methods. A notable barrier has been the scarcity of datasets specifically tailored for jute pests, which limits the effectiveness and generalization capabilities of traditional pest identification models. To combat this issue, we undertook the construction of a comprehensive image dataset that encompasses nine distinct types of jute pests, meticulously designed to enhance model training and evaluation. In this study, we developed a robust deep convolutional neural network (CNN) specifically for jute pest detection, employing OpenCV for effective image preprocessing and augmentation techniques to improve data diversity. Our model achieved an impressive accuracy of 98%, demonstrating its capability to enhance pest recognition even in challenging environmental conditions. The experimental results reflect significant improvements across various performance metrics, including Precision, Recall, and F1 score, alongside a noteworthy increase in mean Average Precision (mAP). This research not only addresses the critical need for effective pest detection solutions in jute cultivation but also offers a high-performance, lightweight model that optimizes both recognition accuracy and computational efficiency, paving the way for more sustainable agricultural practices and effective pest management strategies. Through these advancements, we aim to contribute to the resilience and productivity of jute farming, ultimately supporting the livelihoods of farmers reliant on this vital crop.
After the coronavirus disease 2019 (COVID-19) outbreak, the viral infection known as monkeypox gained significant attention, and the World Health Organization (WHO) classified it as a global public health emergency. Due to the similarities between monkeypox and other pox viruses, traditional classification methods face challenges in accurately identifying the disease. Moreover, the sharing of sensitive medical data raises concerns about privacy and security. Integrating deep neural networks with federated learning (FL) offers a promising approach to overcome these challenges in medical data categorization. In this context, we propose an FL-based framework leveraging the Xception deep learning model to securely classify monkeypox and other pox viruses. The proposed framework utilizes the Xception model for classification and a federated learning environment to ensure data security. This approach allows the model to be trained on distributed data sources without transferring sensitive data, thus enhancing privacy protection. The federated learning environment also enables collaboration across institutions while maintaining the confidentiality of patient data. The experiments are conducted using publicly available datasets, demonstrating the effectiveness of the proposed framework in providing secure and accurate classification of monkeypox disease. Additionally, the framework shows promise in other medical classification tasks, highlighting its potential for widespread application in the healthcare sector.
The simultaneous classification and grading of fruits are essential yet underexplored facets of computer vision in agricultural automation. This study proposes the application of same-domain transfer learning using the NASNetMobile architecture to facilitate multi-fruit classification and grading. Our dual-model framework initially employs NASNetMobile to distinguish between six fruit types—bananas, apples, oranges, pomegranates, limes, and guavas—within the FruitNet dataset. Subsequently, the learned parameters are transferred to a second model, which focuses on grading the quality of the fruits. To address the class imbalance in the dataset, we incorporate a combination of AugMix, CutMix, and MixUp, significantly improving model generalization. These findings affirm the utility of same-domain transfer learning in enhancing grading accuracy using knowledge gained from classification tasks. The study demonstrates the potential for integrating this approach into machine vision systems to advance agricultural automation. Moving forward, this approach could be scaled to address broader cultivation challenges through the continued development of fine-grained visual analysis capabilities.
Cancer remains one of the leading causes of death globally, with nearly 10 million deaths reported in 2020. Among all types, oral cancer ranks as the sixth most prevalent worldwide. Its lethality is mainly attributed to late-stage diagnoses, where treatment becomes more challenging. However, early detection—particularly during pre-cancerous stages—can significantly reduce mortality rates. Early screening and treatment are essential to improving survival rates, highlighting the need for efficient diagnostic methods. In this study, a method is proposed to distinguish between benign and malignant oral cavity lesions while classifying their pre-cancerous stages. This approach explores five different color spaces to extract color and texture-based features from oral cavity images, which are critical for identifying varying lesion stages. These features are then classified using the MobileNetV2 architecture of Convolutional Neural Networks (CNN), chosen for its efficiency in resource-constrained environments. MobileNetV2 offers a lightweight design, which makes it particularly suitable for mobile and real-time applications, where computational resources may be limited. The proposed method stands out due to its combination of handcrafted feature extraction and deep learning classification, utilizing MobileNetV2 for improved accuracy and speed. By capturing complex patterns in both color and texture from the images, the model offers a powerful tool for oral cancer detection, outperforming traditional methods in both time and computational efficiency. The model's ability to work with limited resources makes it an excellent choice for low-cost, mobile-based diagnostic tools. The method demonstrates promising results in both binary and multi-class classifications, successfully differentiating benign, malignant, and pre-cancerous lesions. This technique could greatly enhance early oral cancer detection, especially in settings with limited access to advanced medical facilities. By using a resource-efficient architecture, the method is accessible, scalable, and effective for widespread use in real-world applications, contributing to the reduction of oral cancer mortality through early and accurate diagnosis.
Yoga, a holistic practice blending physical and mental well-being, has gained significant global recognition in recent years. Accurate classification of yogic postures, however, remains a complex challenge due to variations in posture alignment, occluded body parts, and diverse background conditions. This study introduces YogaPoseNet, an advanced approach leveraging the power of the NASNet architecture to overcome these challenges in yogic posture classification. NASNet’s adaptive feature extraction capabilities enable it to effectively capture intricate details of yoga postures, eliminating the need for manual feature fusion from multiple architectures. The proposed model focuses on extracting high-level semantic features to handle posture complexities while maintaining robustness across varied scenarios. By employing this technique, YogaPoseNet offers a precise and efficient solution, paving the way for innovative applications in yoga analysis, personal fitness, and health monitoring systems. This work highlights the potential of cutting-edge neural architectures in redefining traditional practices through technology.
Sleep disorders, such as Insomnia, Sleep Apnea, and other conditions, significantly impact individuals' health and well-being. Accurate and efficient classification of these disorders can aid in early diagnosis and effective treatment, enhancing the quality of life for affected individuals. The existing systems predominantly rely on Artificial Neural Networks (ANN) for classification, which, while effective, can be computationally intensive and less interpretable. This study proposes a Random Forest-based approach for classifying sleep disorders, utilizing a dataset consisting of 400 samples with 13 relevant features. Random Forest model was selected for its robustness, interpretability, and superior ability to handle complex, non-linear relationships within the data. By employing this algorithm, the study aims to classify sleep disorders into three classes: Insomnia, None, and Sleep Apnea, demonstrating improved performance compared to traditional ANN-based systems. The evaluation of the Random Forest model is conducted using standard performance metrics, including accuracy, precision, recall, and F1-score, which show that the proposed approach outperforms existing models, offering enhanced accuracy and reliability in the classification of sleep disorders.
Cardiovascular diseases (CVD) are a leading cause of death worldwide, highlighting the need for effective early detection methods. This study presents a machine learning-based approach for detecting cardiovascular diseases using optimal feature selection techniques and the Random Forest algorithm. The model is designed to enhance predictive accuracy by identifying the most relevant features from patient data, such as age, gender, chest pain type, and other critical health indicators. The Random Forest algorithm was chosen due to its robustness and ability to handle complex data interactions. Through extensive testing, the model demonstrated high accuracy in predicting CVD, outperforming traditional methods. This approach can potentially serve as a reliable tool for early diagnosis and preventive care in clinical settings, ultimately aiding in the reduction of CVD-related fatalities.
Cervical cancer, the second most prevalent cancer among women worldwide, is predominantly caused by the human papillomavirus (HPV). Despite advancements in medical technology, cervical cancer remains a significant contributor to female mortality, especially in low-resource regions. Early detection is crucial, as survival rates exceed 50% when the disease is identified at an early stage. To address this critical need, we present WFC2DS (Web Framework for Cervical Cancer Detection System), an innovative expert web system designed to transform cervical cancer diagnosis. WFC2DS leverages a powerful ensemble of machine learning algorithms, including the highly efficient XGBoost classifier, to analyses a large dataset of 858 patients with 36 attributes, using the Biopsy attribute as the primary target variable. The implementation of the XGBoost algorithm within WFC2DS achieves outstanding performance, including an accuracy rate of 99.9% and superior results in F1 score, precision, and sensitivity, surpassing existing diagnostic systems. This groundbreaking approach demonstrates the potential of advanced machine learning techniques to significantly reduce the global burden of cervical cancer, offering a transformative advancement in women’s healthcare. WFC2DS not only enhances the accuracy and reliability of cervical cancer detection but also represents a critical step forward in the development of web-based diagnostic tools that are accessible, efficient, and capable of improving healthcare outcomes for women worldwide.
Healthcare fraud detection is a critical task that faces significant challenges due to imbalanced datasets, which often result in suboptimal model performance. Previous studies have primarily relied on traditional machine learning (ML) techniques, which struggle with issues like overfitting caused by Random Oversampling (ROS), noise introduced by the Synthetic Minority Oversampling Technique (SMOTE), and crucial information loss due to Random Undersampling (RUS). In this study, we propose a novel approach to address the imbalanced data problem in healthcare fraud detection, with a focus on the Medicare Part B dataset. Our approach begins with the careful extraction of the categorical feature "Provider Type," which allows for the generation of new, synthetic instances by replicating existing types to enhance diversity within the minority class. To further balance the dataset, we employ a hybrid resampling technique, SMOTE-ENN, which integrates the Synthetic Minority Oversampling Technique (SMOTE) with Edited Nearest Neighbors (ENN) to generate synthetic data points while removing noisy, irrelevant instances. This combined technique not only balances the dataset but also helps in mitigating the potential adverse effects of imbalanced data. We evaluate the performance of the logistic regression model on the resampled dataset using common evaluation metrics such as accuracy, F1 score, recall, precision, and the AUC-ROC curve. Additionally, we emphasize the importance of the Area Under the Precision-Recall Curve (AUPRC) as a critical metric for evaluating model performance in imbalanced scenarios. The experimental results demonstrate that logistic regression achieves an impressive 98% accuracy, outperforming other methods and validating the efficacy of our proposed approach for detecting healthcare fraud in imbalanced datasets.
Cyber-attacks are growing with the rapid development and widespread use of internet technology, with botnet attacks emerging as one of the most harmful threats. Identifying these botnets is increasingly challenging due to numerous attack vectors and the ongoing evolution of malware. As Internet of Things (IoT) technology advances, many network devices have become vulnerable to botnet attacks, leading to substantial losses across various sectors. These botnets pose serious risks to network security, and deep learning models have shown promise in efficiently identifying botnet activity from network traffic data. In this research, we propose a botnet identification system based on the stacking of artificial neural networks (ANN), convolutional neural networks (CNN), long short-term memory (LSTM), and recurrent neural networks (RNN), referred to as the ACLR model. Additionally, we introduce an XGBoost algorithm, achieving an accuracy of 81%, as a complementary approach to enhance classification performance. Experiments were conducted using both individual models and the proposed ACLR model for performance comparison. Utilizing the UNSW-NB15 dataset, which includes nine different attack types such as ‘Normal’, ‘Generic’, ‘Exploits’, ‘Fuzzers’, ‘DoS’, ‘Reconnaissance’, ‘Analysis’, ‘Backdoor’, ‘Shell Code’, and ‘Worms’, we found that the ACLR model achieved a testing accuracy of 0.9698, effectively capturing the intricate patterns and characteristics of botnet attacks. The model's robustness and generalizability were further demonstrated through k-fold cross-validation, with k values of 3, 5, 7, and 10 yielding a k = 5 accuracy score of 0.9749. Moreover, the proposed model detects botnets with a high receiver operating characteristic area under the curve (ROC-AUC) of 0.9934 and a precision-recall area under the curve (PR-AUC) of 0.9950. Performance comparisons with existing state-of-the-art models further corroborate the superior effectiveness of the proposed approach, highlighting the potential of these findings to significantly aid in combating evolving threats and enhancing cybersecurity procedures.
Early detection of stroke warning symptoms can significantly reduce the severity of ischemic stroke, which is the leading cause of mortality and disability worldwide. This study aims to develop a predictive model for ischemic stroke by leveraging advanced machine learning techniques. In this approach, an artificial neural network (ANN) was employed as a feature extractor to capture the complex relationships within the healthcare dataset comprising 5110 patient health profiles. To enhance the predictive capability, a stacking ensemble model was constructed by integrating three powerful classifiers: Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XGBoost), and Random Forest (RF). The synthetic minority oversampling technique (SMOTE) was utilized to address class imbalance in the dataset, ensuring that the model effectively learns from underrepresented classes. The models were evaluated using performance metrics such as accuracy, precision, recall, F1-score, area under the curve (AUC), and confusion matrix. The stacking ensemble model demonstrated superior performance, achieving a 95.9% accuracy, outperforming individual classifiers and prior studies on the same dataset and providing interpretability to the predictions and facilitating personalized decision-making in resource-constrained settings. This proposed model offers a robust framework for early stroke detection, with potential applications in clinical decision support systems.
The persistence of SMS spam continues to pose significant challenges, necessitating the development of effective detection systems that can handle increasingly sophisticated evasion techniques employed by spammers. This study addresses these challenges by presenting a comprehensive SMS spam filtering system utilizing machine learning models, specifically focusing on Long Short-Term Memory (LSTM) networks. We introduce a new SMS dataset message, comprising 61% legitimate (ham) messages and 39% spam, which represents the largest publicly available SMS spam dataset to date. A longitudinal analysis of spam evolution was performed, followed by the extraction of semantic and syntactic features for evaluation. We then conducted a comparative analysis of various machine learning approaches, ranging from shallow models to advanced deep neural networks. Our findings reveal that shallow models and traditional anti-spam services are vulnerable to evasion techniques, resulting in poor performance. In contrast, the LSTM model demonstrated superior performance, achieving 98% accuracy in classifying SMS messages. Despite this high accuracy, some evasion strategies still challenge the detection process, highlighting areas for further research. This study advocates for continued development in robust SMS spam filtering systems and provides valuable insights into the effectiveness of deep learning models in combating SMS spam.
Employee turnover presents a significant challenge for organizations worldwide. While advanced machine learning algorithms hold promise for predicting turnover, their real-world application often falls short due to the limitations in fully leveraging the relational structure within employee tabular data. To bridge this gap, this study proposes a novel framework that transforms traditional tabular employee data into a knowledge graph structure, enabling the use of Graph Convolutional Networks (GCNs) for more refined feature extraction. This approach goes beyond mere prediction, incorporating explainable artificial intelligence (XAI) techniques to uncover the key factors that influence an employee's decision to stay with or leave an organization. Our empirical analysis, conducted on a comprehensive dataset of 1,470 IBM employees, demonstrated the effectiveness of our methodology. When benchmarked against five widely-used machine learning models, our enhanced Linear Support Vector Machine (L-SVM) model, augmented with knowledge-graph-based features, achieved remarkable accuracy. The use of knowledge graphs further enriches this analysis by modeling the complex relationships within an organization, offering a holistic view of the dynamics that influence turnover. Additionally, the focus on explainable AI has grown, allowing HR professionals to understand the key factors driving turnover predictions, enabling proactive interventions.
Pregnancy complications pose significant risks to maternal and fetal health, necessitating early detection for timely interventions. Manual analysis of cardiotocography (CTG) tests, the conventional practice among obstetricians, is labor-intensive and prone to variability. This study addresses the critical need for accurate fetal health classification using advanced machine learning (ML) techniques, focusing on the application of XGBoost, a powerful gradient boosting algorithm. Utilizing a publicly available dataset, despite its size, this research leverages its rich features to develop and analyze ML models. The objective is to explore and demonstrate the efficacy of ML models in classifying fetal health based on data. Our proposed system applies the XGBoost algorithm and achieves an exceptional accuracy of 96%, surpassing previous methods. This highlights the algorithm's robustness in enhancing diagnostic precision and facilitating timely interventions. The study underscores the potential of integrating ML models into routine clinical practices to streamline fetal health assessments. By optimizing resource allocation and improving time efficiency, these models contribute to early complication detection and enhanced prenatal care. Further research is encouraged to refine ML applications, promising continued advancements in fetal health assessment and maternal care.
Numerous research studies underscore the importance of accurately detecting head impacts and implementing effective safety measures. This study addresses this critical need by leveraging machine learning algorithms applied to data obtained from piezoelectric sensors mounted on a simulated head model. Using a systematic approach, we employ Random Forest (RF) and Extreme Gradient Boosting (XGBoost) models to analyze normalized sensor data, with the goal of precisely identifying impact locations. Through rigorous k-fold cross-validation and comprehensive performance evaluation, we demonstrate that the XGBoost model slightly outperforms the RF model, achieving a Root Mean Square Error (RMSE) of 0.4764 and a coefficient of determination (R²) of 0.9085. Feature importance assessments indicate an optimal sensor placement strategy, which may facilitate a reduction in model complexity while maintaining predictive accuracy. The superior performance of the XGBoost model, along with strategic sensor placement, underscores the study's contribution to enhancing head impact safety measures in both sports and industrial contexts. These findings pave the way for future research aimed at deploying intelligent safety systems that harness the synergy between wearable technology and machine learning.
Heart disease (HD), including heart attacks, is a leading cause of death worldwide, making accurate determination of a patient's risk a significant challenge in medical data analysis. Early detection and continuous monitoring by physicians can significantly reduce mortality rates, but heart disease is not always easily detectable, and physicians cannot monitor patients around the clock. Machine learning (ML) offers a promising solution to enhance diagnostics through more accurate predictions based on data from healthcare sectors globally. This study aims to employ various feature selection methods to develop an effective ML technique for early-stage heart disease prediction. The feature selection process utilized three distinct methods: chi-square, analysis of variance (ANOVA), and mutual information (MI), leading to three selected feature groups designated as SF-1, SF-2, and SF-3. We then evaluated ten different ML classifiers, including Naive Bayes, support vector machine (SVM), voting, XGBoost, AdaBoost, bagging, decision tree (DT), K-nearest neighbor (KNN), random forest (RF), and logistic regression (LR), to identify the best approach and feature subset. The proposed prediction method was validated using a private dataset, a publicly available dataset, and multiple cross-validation techniques. To address the challenge of unbalanced data, the Synthetic Minority Oversampling Technique (SMOTE) was applied. Experimental results showed that the AdaBoost classifier achieved optimal performance with the combined datasets and the SF-2 feature subset, yielding rates of 96.84% for accuracy, 95.32% for sensitivity, 91.12% for specificity, 94.67% for precision, 92.36% for F1 score, and 98.50% for AUC. Additionally, an explainable artificial intelligence approach utilizing SHAP methodologies is being developed to provide insights into the system's prediction process. The proposed technique demonstrates significant promise for the healthcare sector, facilitating early-stage heart disease prediction with reduced costs and minimal time. Ultimately, the best-performing ML method has been integrated into a mobile app, enabling users to input HD symptoms and receive rapid heart disease predictions.
Addressing the intricate challenge of fake news detection, traditionally reliant on the expertise of professional fact-checkers due to the inherent uncertainty in fact-checking processes, this research leverages advancements in language models to propose a novel Long Short-Term Memory (LSTM)-based network. The proposed model is specifically tailored to navigate the uncertainty inherent in the fake news detection task, utilizing LSTM's capability to capture long-range dependencies in textual data. The evaluation is conducted on the well-established LIAR dataset, a prominent benchmark for fake news detection research, yielding an impressive accuracy of 99%. Moreover, recognizing the limitations of the LIAR dataset, we introduce LIAR2 as a new benchmark, incorporating valuable insights from the academic community. Our study presents detailed comparisons and ablation experiments on both LIAR and LIAR2 datasets, establishing our results as the baseline for LIAR2. The proposed approach aims to enhance our understanding of dataset characteristics, contributing to refining and improving fake news detection methodologies by effectively leveraging the strengths of LSTM architecture.
This study investigates the prevalence and impact of social anxiety among high school students at Little Scholars Matriculation Hr. Sec. School in Thanjavur, Tamil Nadu, India. A dataset was created by surveying students with a 17-item Social Phobia Inventory (SPIN) questionnaire, which includes questions related to their experiences with social interactions, fear of judgment, and discomfort in various social situations. Using this dataset, the research applies a Random Forest machine learning approach to analyze student responses and assess the severity of social anxiety. The model aims to predict the levels of social anxiety by identifying significant features that contribute to higher distress levels. Through feature selection and correlation analysis, the study uncovers complex relationships between various aspects of social interactions that influence social anxiety. The performance of the Random Forest model is evaluated based on its accuracy and predictive power, demonstrating its ability to predict social anxiety in high school students effectively. The study highlights the potential of Random Forest for accurately identifying key factors associated with social phobia and recommends further research to refine predictive models, offering valuable insights for enhancing mental health support systems for high school students.
Efficient bed management is essential for minimizing hospital costs, improving efficiency, and enhancing patient outcomes. This study introduces a predictive framework for forecasting hospital-ICU length of stay (LOS) at admission, leveraging hospital EHR data. Unlike prior work, which heavily relied on advanced tree-based models, this research proposes a K-Nearest Neighbors (KNN) model with hyperparameter optimization using GridSearchCV for predicting ICU patients’ LOS. The KNN model effectively classifies patients into short and long LOS categories by learning patterns in clinical information systems (CIS). To ensure robustness, we evaluated the model using various performance metrics, including Accuracy, AUC, Sensitivity, Specificity, F1-score, Precision, and Recall. The optimized KNN model demonstrated competitive predictive performance with improved interpretability compared to traditional complex models. Additionally, explainable artificial intelligence (xAI) techniques were incorporated to provide transparent insights into the decision-making process, further enhancing the trustworthiness of the predictions. This work highlights the potential of using machine learning models like KNN for reliable, interpretable, and efficient ICU LOS prediction, aiding hospitals in improving resource allocation and patient care outcomes.
Dyslexia is a specific learning disability that makes reading and writing difficult, often leading to academic struggles if not detected early. Identifying dyslexia, particularly in languages with complex orthographies, remains a significant challenge. Traditional methods for diagnosis are typically reliant on manual feature extraction and expert intervention, which can be time-consuming and error-prone. To address these limitations, we propose an optimal ensemble learning model for dyslexia prediction, incorporating machine learning techniques with an adaptive genetic algorithm for weight optimization. Our approach first involves extracting relevant features associated with dyslexia. These features are then processed using a hybrid ensemble model, combining three powerful machine learning algorithms: Gradient Boosting, Support Vector Classifier (SVC), and AdaBoost, through a soft voting mechanism. The ensemble model is further enhanced by employing an adaptive genetic algorithm to fine-tune the weights assigned to each individual model, optimizing their contributions to the final prediction. We demonstrate the effectiveness of this method by conducting experiments on a dyslexia dataset, where our proposed model significantly outperforms traditional approaches in terms of classification accuracy and robustness. This results in a more precise and scalable solution for dyslexia detection, with the potential to aid in early diagnosis and intervention.
Breast cancer is one of the leading causes of cancer-related deaths among women, making early and precise detection a critical challenge. Traditional methods for breast cancer classification often rely on static models that may not fully capture the complexities of the data. In this study, we propose an enhanced approach for breast cancer detection using a Support Vector Classifier (SVC), coupled with a GridSearchCV procedure for hyperparameter optimization. The goal of this model is to achieve better classification performance by systematically exploring different parameter configurations, including the regularization parameter (C), kernel type, and gamma. The model was trained and tested on the Wisconsin Breast Cancer Diagnostic (WBCD) dataset, with GridSearchCV employed to select the optimal hyperparameters through a 5-fold cross-validation process. The resulting SVC model achieved significant improvements in accuracy, precision, recall, F1 score, and AUC when compared to other standard machine learning models. This approach demonstrates the potential of SVC with grid search as a robust solution for breast cancer classification, offering better diagnostic capabilities and aiding in more accurate predictions of malignant and benign tumors. Our findings highlight the importance of hyperparameter optimization in machine learning for healthcare applications and suggest that the proposed model could play a vital role in the early diagnosis and treatment of breast cancer.
Insurance fraud, particularly within the automobile insurance sector, is a significant challenge faced by insurers, leading to financial losses and influencing pricing strategies. Fraud detection models are often impacted by class imbalance, where fraudulent claims are much rarer than legitimate claims, and missing data further complicates the process. This research tackles these issues by utilizing two car insurance datasets—an Egyptian real-life dataset and a standard dataset. The proposed methodology includes addressing missing data and class imbalance, and it incorporates the AdaBoost Classifier to enhance the model’s accuracy and predictive power. The results demonstrate that addressing class imbalance plays a crucial role in improving model performance, while handling missing data also contributes to more reliable predictions. The AdaBoost Classifier significantly outperforms existing techniques, improving prediction accuracy and reducing overfitting, which is often a challenge in fraud detection models. This study presents valuable insights into how improving data quality and using advanced algorithms like AdaBoost can enhance fraud detection systems, ultimately leading to more effective identification of fraudulent claims. These enhancements can significantly aid insurance companies in reducing financial losses, improving decision-making, and refining pricing models.
The agricultural sector in many South Asian countries, including Bangladesh and India, plays a pivotal role in the economy, with a significant portion of the population relying on it for their livelihood. However, farmers face challenges like unpredictable weather, soil variability, and natural disasters such as floods and erosion, leading to crop losses and financial difficulties. This often results in a decline in interest in agriculture despite government support. Our study focuses on predicting the classification of various crops, such as rice, jute, and maize, using a combination of soil and weather features. The predictive model leverages soil parameters like Nitrogen, Phosphorus, Potassium, and pH levels, alongside weather variables such as Temperature, Humidity, and Rainfall. We propose a hybrid approach that integrates machine learning with genetic algorithms, where a Random Forest Classifier is used for crop classification across 22 different crop types. The Genetic Algorithm is utilized to optimize hyperparameters, enhancing model performance and robustness. Additionally, we applied Explainable AI (XAI) techniques, including Local Interpretable Model-agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP), to interpret and validate the model’s predictions. By improving feature selection and model parameters, our approach addresses limitations associated with existing models, providing more reliable and accurate predictions. This system has the potential to reduce crop losses, improve agricultural productivity, and contribute to the sustainability and prosperity of the agricultural sector.
Rolling bearing faults frequently cause rotating equipment failure, leading to costly downtime and maintenance expenses. As a result, researchers have focused on developing effective methods for diagnosing these faults. In this paper, we explore the potential of Machine Learning (ML) techniques for classifying the health status of bearings. Our approach involves decomposing the signal, extracting statistical features, and using feature selection employing Binary Grey Wolf Optimization. We propose an ensemble method using voting classifiers to diagnose faults based on the reduced set of features. To evaluate the performance of our methods, we utilize several performance indicators. Our results demonstrate that the proposed voting classifiers method achieves superior fault classification, highlighting its potential for use in predictive maintenance applications.
This study delves into the application of machine learning techniques for predicting rice production in Indonesia, a country where rice is not just a staple food but also a key component of the agricultural sector. Utilizing data from 2018 to 2023, sourced from the Central Bureau of Statistics of Indonesia and the Meteorology, Climatology, and Geophysics Agency of Indonesia, this research presents a comprehensive approach to agricultural forecasting. The study begins with an Exploratory Data Analysis (EDA) to understand the variability and distribution of variables such as harvested area, production, rainfall, humidity, and temperature. Significant regional disparities in rice production are identified, highlighting the complexity of agricultural forecasting in Indonesia. Five machine learning models—Random Forest, Gradient Boosting, Decision Tree, Support Vector Machine, and XGBRegressor—are trained and tested. The XGBRegressor model stands out for its superior performance, demonstrating its high predictive accuracy and reliability. Hyperparameter tuning using the GridSearchCV technique was conducted on all five models, resulting in performance improvements across the board. This research not only underscores the effectiveness of machine learning in improving rice production predictions in Indonesia but also sets the stage for future research. It highlights the potential of advanced analytical techniques in enhancing agricultural productivity and decision-making, paving the way for further explorations into more sophisticated models and a broader range of data, ultimately contributing to the resilience and sustainability of Indonesia’s agricultural sector.