Solar powered integrated multi sensors to monitor inland lake water quality using statistical data fusion technique with Kalman filter | Scientific Reports

Scientific Reports volume 14, Article number: 25202 (2024) Cite this article

Metrics details

This study proposes a data-driven statistical model using multi sensor fusion and Kalman filtering for real-time water quality assessment in lakes. A recursive estimation technique, the Kalman Filter, is employed to handle uncertainties and enhance computational efficiency. The fusion process integrates data from sensors monitoring parameters like chlorophyll concentration, surface water elevation, temperature, and precipitation, producing Markov features to capture temporal transitions and environmental dynamics. Data synchronization and fusion are achieved through recursive KF methods, enabling real-time adaptive management in response to environmental fluctuations such as seasonal changes, precipitation (6–18%), and evaporation rates (1.2–11.9 mm/day). Over a 30-day evaluation period, the model accurately predicted chlorophyll concentrations, reaching 128 \(\hbox {mg}/\hbox {m}^{3}\) in mid-level inflow regions (3.6 m water elevation) compared to 86 \(\hbox {mg}/\hbox {m}^{3}\) in extreme inflow areas (5.5 m). The integration of Markov feature extraction and eigenvalue estimation enhanced prediction stability and sensitivity, with the KF maintaining computational efficiency at 7.8 ms per computation cycle. The model’s accuracy was validated by achieving a residual error of less than 0.05 with minimal noise interference. Overall, the system provides a resilient and precise framework for real-time lake water quality assessment, capable of handling multi-parameter uncertainties and dynamic environmental changes, thereby supporting informed decision-making for aquatic ecosystem management.

Characterization of lake water: All forms of lakes share several biological and biogeochemical processes, and the study of these processes is known as “limnology.” The interactions between physical, biological, and chemical processes are typically quantitatively or qualitatively different from those on land or in the air, making the Lake an excellent setting for the study of ecosystem dynamics1. There is close interaction among several ecosystem components due to the unique limits between water and land and water and air. Due to groundwater inflows or evaporation, the isolated Lakes are salty. Normally Lakes have been classified as four main zones as follows: Littoral, Profundal, Limnetic and Benthic2. Further, the existence of salinity is categorized as hypereutrophic, Oligotrophic, Mesotrophic, and Eutorphic respectively. Lakes have low transparency and frequently have significant nuisance algal blooms due to their high fertilizer content. Less than 0.88 m of visibility depth, more than 39 lg/l of total chlorophyll, and more than 99 lg/l of phosphorus are characteristics of a hypereutrophic Lake3. Dead zones below the surface are caused by large algal blooms because they dramatically drop oxygen levels and make it impossible for life to operate at lower depths. Although Lakes retain 48.99% of the water on Earth’s surface, they also house 34.78% of the freshwater surface liquid4.

Source of contamination of lake water: Global eutrophication has escalated due to human activities that have increased nutrients in water bodies. Monitoring of water quality parameters like turbidity, salinity, chlorophyll, sediments, plastic debris, and oxygen depletions is necessary for accurate assessment of the health and trends of the Lake to regulate their water quality5. As a consequence, several measures were proposed as potential metrics of the ecological state of Indian Lakes, including the oxygen concentrations, dichromatic chemical oxygen demand (COD-Cr), total phosphorus (TP), and biological oxygen demand (BOD). Total nitrogen (TN) and chlorophyll a concentrations are both considered to be significant, but there is not currently enough information to integrate these two factors6. Further, among the world’s freshwater resources, the inland Lake stands out particularly holding 85.63% freshwater source to the entire global water necessity. Though it stabilizes the water requirement, it has been disturbed by various attributes like wastewater disposal, plastic debris, and untreated effluents7. From Fig. 1, it is shown that India’s urbanization and population growth are polluting its waterways.

In examining Fig. 1, which shows substantial differences in the “water degradation estimate in percent” between consecutive months like December and January, two possible patterns emerge: semi-fluctuating behavior and an upward trend. The observed fluctuations between close months could be explained by seasonal variability, where changes in environmental conditions such as temperature, precipitation, and solar radiation affect water quality. This semi-fluctuating pattern suggests that water degradation is influenced by short-term, event-driven factors such as stormwater runoff, algal blooms, or pollution pulses, leading to rapid changes in water quality. Alternatively, if the data shows a long-term increase in degradation estimates, it may point to a broader upward trend in water degradation. This would indicate that, despite month-to-month variability, the overall health of the aquatic system is deteriorating over time due to cumulative influences such as persistent pollution, land-use changes, or climate change. Substantial differences between months like December and January would then represent temporary events superimposed on an overall trend of increasing water degradation. Understanding whether the chart reflects a primarily semi-fluctuating behavior or an upward trend is crucial for interpreting the health of the water system and planning appropriate management or mitigation strategies.

Inland lake degradation trends obtained during the year 2022.

In India, an estimated 80% of the surface water is unsafe for human consumption. Only a small portion of the over 39 million liters of wastewater that enter rivers and other water bodies each day is properly treated. Untreated sewage is the main cause of water contamination in India. Agricultural runoff and uncontrolled small-scale industries are other pollutant sources8. Some of the factors that contribute to the deterioration of the Lake include agricultural wastes, sewage and drainage water from metropolitan areas, the absence of wetlands and riparian vegetation nearby, etc. Man’s leisure hobbies can occasionally degrade society9.

From Fig. 2, numerous factors involved in marine shore quality including temperature, dissolved oxygen, phosphorus, BOD, electrical conductivity, nitrate, and potassium, can be used to evaluate the quality of water10. Metals like Hg, Cr, Pb, and Fe are particularly dangerous since they can induce chronic poisoning in aquatic species or water. Around the world, harmful algal blooms are growing more frequent in freshwater environments. Plastic debris pollution is a growing environmental hazard in bodies of water because it affects the open Lake, the coastline, and marine ecology. Plastic density levels in surface water are comparable to those found in locations where trash accumulates within marine gyres. Various techniques, including the hazard quotient and water quality index, have been employed to evaluate the Lake’s water quality11. Also, the inland Lake provides a crucial habitat for a wide variety of creatures. Because of their essential significance, it is need to enhance the management of Lake pollution and clean up polluted Lakes to quickly restore their ecosystem services and contribute to the advancement of ecological civilization

Inland water quality influencing parameters contribution rate.

Recent lake water quality issues: The growth of human society has had a significant influence on the environment, and attempts have been made in a relentless pursuit of bettering its preservation. One such important initiative that has enabled numerous physical characteristics to be monitored to manage or prevent the future advancement of environmental deterioration is environmental monitoring. Lake water offers baseline data for the study of several physicochemical parameters that indicate the quality of the pond water and the conservation and monitoring of the pond12. Wu et al investigated that the pond’s water contaminations are mainly due to the organic materials and the fertilizers used in the nearby crop fields. The presence of plankton in a body of water serves as a reflection of the typical biological circumstances and a potential water quality indicator. The physicochemical characteristics of this pond were investigated to understand the effects of pollution caused by home wastewater13, agricultural chemicals, and fertilizers.

The distribution of plants and the enzymatic activity are both influenced by pH, which is necessary for the normal existence of all organisms. Due to phytoplankton photosynthesis and respiration, ponds’ dynamic oxygen cycle changes throughout the day. It is ironic that, as a result of several anthropogenic activities, they have degraded in modern times. These include encroachment, eutrophication, illicit mining, unrestricted tourism, and cultural appropriation. Another significant hazard to water bodies, particularly in metropolitan areas, is encroachment14. The principal features of the upper catchment of the Lake include low productivity, inadequate vegetative cover, heavy runoffs, soil erosion, and the existence of a deep and extensive network of gullies. These variables collectively are to blame for the significant siltation. Sandstone and shale are two separate parent materials that coexist. Their varied hydrothermal characteristics and erosion patterns result in a lack of cohesiveness, which causes significant runoffs off the bare hill slopes. In general, sedimentary rocks, through their physical properties, contribute to poor vegetation growth and make the land susceptible to erosion and run-off. The resulting soil and nutrient loss degrade terrestrial ecosystems and lead to an increase in sedimentation and pollutants entering water bodies. This series of events ultimately influence lake water quality, causing problems like eutrophication, reduced oxygen levels, and habitat disruption. Addressing soil erosion and managing run-offs, especially in areas with sedimentary rocks is critical to preserving both the terrestrial environment and the health of aquatic ecosystems like lakes.

Traditional monitoring techniques have several drawbacks, including a lengthy cycle, gaps in time and place, and increased costs. To monitor the water quality with spontaneous, multi-data analysis, and multi-spectral properties, remote sensing technology has been widely deployed. Real-time data analytics are still lacking in recent platforms like online data loggers established to assist in increasing the spatial and multi-nodal resolution of marine water monitoring15. Environmental monitoring may now be done remotely and in real-time, thanks to the development of low-power WSN technology. Since then, this strategy has favoured a proactive reaction to environmental pollution. Finding the distinctive properties of the water and comparing them to predetermined norms and recommendations constitutes monitoring its quality. The degree to which the water complies with the requirements of its intended use often indicates how acceptable it is within threshold bounds.

The need to accumulate multiple samples to confirm precise and reliable analysis is one of the main issues in water quality monitoring16. In this aspect, manual approaches are seen as ineffectual and add a great deal of complication Early warning capabilities made possible by real-time monitoring provide prompt reaction to water pollution. WSN networks have substantial potential for many monitoring applications since they have comparatively inexpensive starting and maintenance costs17. Table 1 provides the multiple parameter contribution in the water quality assessment with its numerical range for human sustainability.

Table 1 shows the important parameters associated with Lake water quality monitoring to maintain the assessment involving various threshold ranges for the desired analysis of a huge volume of data in an unstructured format. In the existing Monitoring system using remote sensing, GIS or WSN technological platform does not hold an efficient data-driven modeling approach to support multi-sensor parameter estimation and processing to provide sufficient maintenance action on the control system18. Various previous researchers have undertaken WSN platforms with various communication configurations for different assessments of water quality as given in Table 2. It is understood that, among various configurations, SigFox and LoRaWAN hold real-time performance improvement in real-time data fetching and communication as compared to other technical standards. It can reach up to five years of autonomy because of its low power characteristics. The first mobile network in the world specifically designed for Machine-to-Machine incorporated low bandwidth and Internet of Things applications is called SigFox19. Utilizing unlicensed frequency channels, its unique Ultra Narrow Band (UNB) technology utilizes a small spectrum for open-space fast data communication holding a range of up to 40 kilometers.

Both point and non-point sources of pollution, such as urban runoff, industrial discharge, runoff from agricultural areas, and sewage discharge, influence water quality. Floods, droughts, user ignorance, and lack of education are some more causes of water pollution. To preserve the quality of water resources, users must be involved in preserving water quality and considering other factors including hygiene, environmental sanitation, storage, and disposal emphasizing the description of the state machine while discussing the WSN idea in Lake water quality monitoring20.

The presented research on solar-powered integrated multi-sensors data fusion technique with the Kalman filter introduces several novel elements and addresses key challenges that distinguish it from the existing literature.

Solar-powered sensor integration:

Novelty: A primary distinction of your research is the use of solar power to operate multi-sensor networks. This innovation ensures that monitoring systems can be deployed in remote and hard-to-access regions without the need for external power sources. The ability to sustain monitoring systems using renewable energy makes the solution scalable and environmentally friendly, a critical factor for large or isolated water bodies.

Distinction: Papers such as Ref21 and Ref22 focus on multi-source data assimilation and numerical modeling, but they do not address the practical challenge of powering long-term, remote monitoring systems. Your integration of solar power introduces a novel aspect to sensor deployment, ensuring both sustainability and extended operational capabilities. In contrast, Liu (2020) discusses sensor fusion for navigation but does not emphasize solar power or its application to continuous environmental monitoring.

Statistical data fusion with Kalman filter:

Novelty: Your research employs the Kalman filter for data fusion, which allows for real-time integration of multi-sensor data, accounting for uncertainties and dynamically updating predictions based on new measurements. This approach enhances the accuracy of water quality assessments by filtering noise and providing optimized estimates of critical parameters such as chlorophyll concentration, turbidity, and dissolved oxygen.

Distinction: The Kalman filter has been used in other fields, but the specific focus of your research on its use in conjunction with solar-powered sensors for inland water quality monitoring presents a novel application. While21 use the Ensemble Kalman Filter (EnKF) for hydro-ecological dynamics modeling, their work does not integrate real-time sensor data from a solar-powered system or focus on inland lakes. Similarly,22 applies data assimilation techniques but does not utilize the Kalman filter for real-time water quality monitoring with a renewable energy component.

Real-time, continuous monitoring with multi-sensors:

Novelty: The continuous real-time monitoring enabled by the fusion of multi-sensor data distinguishes your approach from traditional methods that rely on periodic sampling or satellite remote sensing. By integrating data from multiple sensors (e.g., pH, temperature, turbidity, dissolved oxygen) with statistical fusion, your system provides a more granular, real-time understanding of water quality changes.

Distinction:In comparison, Ref23 review applications of the Geostationary Ocean Color Imager (GOCI) for water quality monitoring but focus on remote sensing and satellite imagery, which lacks the real-time granularity of on-site sensor networks. Remote sensing provides valuable data for broad spatial analysis, but it cannot match the temporal resolution or immediate response capabilities offered by your solar-powered sensors combined with the Kalman filter.

Practical, scalable water quality management:

Novelty: Your research targets a practical, scalable solution for monitoring inland water bodies by leveraging solar power and multi-sensor networks. The system is designed to function in a wide range of environmental conditions, allowing continuous water quality monitoring without substantial operational costs or the need for manual intervention.

Distinction:Papers such as Ref24 and Ref25 emphasize the use of satellite or Earth observation data for monitoring water bodies but do not address the practical challenge of deploying cost-effective, ground-based sensors. Your approach offers a more localized, actionable tool for environmental management, especially in regions where satellite data may be limited by cloud cover, low resolution, or other constraints.

Addressing real-world challenges:

Tackling key challenges: The present research addresses a serious and pressing challenge to ensure continuous, reliable, and sustainable monitoring of water quality in inland lakes, especially in remote regions. By integrating renewable energy (solar power) with advanced data fusion techniques, you have created a solution that not only solves the problem of data availability but also enhances the robustness of predictions and management interventions. This addresses uncertainties in water quality assessments, particularly in environments where rapid changes in conditions (e.g., weather events, inflows) require immediate attention.

Distinction:Unlike26, who focuses on unmanned surface vehicles (USVs) and their navigation using sensor fusion, your research is specifically oriented toward environmental monitoring with a focus on sustainability and cost-effectiveness. It provides a solution that can be scaled and adapted to various environments, making it highly applicable for long-term water quality management and early-warning systems.

Real-time data integration for management decision-making:

Novelty: By incorporating real-time data fusion and the ability to integrate predictive models using the Kalman filter, your research provides a dynamic decision-making tool for water resource managers. This capability is critical for addressing rapid environmental changes, such as sudden inflows, pollution events, or seasonal variations.

Distinction: Reference25 focuses on the dynamics of water mass variations using multi-sensor Earth observation data but without real-time integration or solar-powered systems. Your research closes this gap by providing a tool that can dynamically adjust based on real-time data, making it more responsive and adaptable to changing environmental conditions.

The presented research offers a novel solution for real-time, sustainable, and high-resolution monitoring of inland lake water quality through the integration of solar-powered multi-sensors and a Kalman filter-based statistical data fusion technique. This research addresses substantial real-world challenges, including the need for continuous monitoring in remote areas, the importance of accurate data fusion for water quality predictions, and the use of renewable energy to support long-term operations. Compared to existing studies, it not only poses a problem but also provides a comprehensive, scalable solution that enhances both the spatial and temporal resolution of water quality monitoring, while addressing the uncertainties and complexities of dynamic aquatic ecosystems. A proposed novel approach holds (1) WSN monitoring tasks (chemical, physical, and biological observation); (2) Statistical data-driven model with multi-sensors attributes; (3) Kalman filter analyzer for multi-environmental conditions (water quality and availability); (4) Decision states (observation and networking conditions); and (5) water area. The suggested solution will develop a self-configure WSN network based on any type of pre-processed data on the multi-sensor fusion model to afford reliable water quality indices with precited sets to maintain ecological stability in the Lake regions.

The goal of the multisensor fusion system is to choose any data that configures the nature of the water quality of the lake and its capability to vary, and to create a database for tracking water quality. This is the key to comprehending the environmental problems with lake water and also to promoting a protective strategy using solar energy. The uniqueness of the present research findings can be used as a technical trial for examining the water environment and sustainability issues of Lake Water27, for assessing and addressing significant issues with the lake water environment and promoting efficient decision-making architecture concerning with environment development and protection. The inherent variability of metrics can be successfully used in some cases as a signal to find specific impacts on the system or relationships within the system, depending on the purpose of monitoring, but in other cases, it acts as noise, complicating the detection of the signal of interest by incorporating renewable energy sources such as solar for the continuous operation of the sensor system.

Proposed data-driven multi-sensor fusion database of Lake water quality estimator.

Optimal Measurement Point Selection Strategy: In this research, a targeted screening process is to select measurement points that would allow for accurate estimation of outputs such as chlorophyll concentration, while taking solar power as a key factor influencing phytoplankton dynamics. The screening process was based on identifying spatial and temporal zones that exhibit substantial variability in water inflow and solar exposure. Specifically, current work focused on:

Hydrological and solar variability: Measurement points were selected to capture a range of inflow patterns, from high to low, and areas with varying solar exposure. This spatial heterogeneity was important for understanding the interactions between nutrient loading, retention time, and light availability.

Seasonal and diurnal sampling: To account for temporal variation in solar power, we incorporated sampling across different seasons and times of day, ensuring that key periods of phytoplankton activity were captured.

Vertical and horizontal coverage: We selected points that span both vertical profiles and horizontal gradients within the water body, capturing the complexity of light and nutrient availability at different depths and locations.

Data-driven insights: Historical data and preliminary analysis helped guide the selection of points with the highest sensitivity to inflow and solar radiation changes, optimizing our ability to model and predict chlorophyll concentrations.

This approach ensures that the collected data reflects the dynamic nature of the aquatic system, leading to more accurate predictions of phytoplankton activity and chlorophyll concentration in response to both hydrological and solar power influences.

(a) Multi-sensors fusion model measurement: There are two primary approaches to measurement fusion. The initial method involves the amalgamation of measurements into an augmented observation vector. Meanwhile, the second method uses minimum mean square estimates to combine measurements. In the first technique of measurement fusion, the measurement vectors X1 and X2 from multiple sensors are combined to create a novel augmented measurement vector. This integration assumes the statistical independence of the two sensors and generates a covariance matrix \(W_k\) to represent the merged measurement noise.

From Eq. (1), the complexity remains in the same state of model analysis with more computation time. An alternative strategy for measurement fusion involves assigning weights to individual measurements from each sensor. Subsequently, these weighted measurements are tracked using a Kalman filter to derive an estimate of the state vector. Given the independence of measurement noise for sensors 1 and 2, the recursive formulation for fusing the measurement vectors U1 and U2 in a minimum mean square estimate is expressed through the covariance matrix of the measurement vector for sensor m-i.

Hence, the second model expressed in Eq. (2) gives more feasibility in analyzing multiple sensor data into the single computation model for real-time experimentation. Monitoring of water bodies with two conflicting goals: detection of human influences with the effect of natural variability reduction or consequences of climate change, for which anthropogenic changes constitute confounding factors to indicate classic scenarios to illustrate system uniqueness on data mining28. Figure 3 represents the proposed integrated multisensor data fusion architecture with database characterization and sampling points on any type of preprocessed (unstructured) data inputs.

(b) Site localization: The next important factors that determine the effectiveness of monitoring, after the choice of the appropriate indicators, are the thoughtful and focused choice of monitoring locations and the timing of sampling. The positioning of the sites should be such that they can indicate changes in the overall water quality of the basin for trend monitoring29. The proposed methodology seeks to achieve large-scale, continuous, and dynamic monitoring of inland lake water quality by fully leveraging the benefits of wireless sensor node data transmission technology. The objective is to develop a multi-sensor fusion model application platform for the Inland Lakes Monitoring Module is expected to fulfill the following functional requirements: Automated processing of assessment of water quality through a combination of multi-sensor node water quality observation data and threshold data with Kalman filter data analysis. Dynamic tracking of changes in water quality, including monitoring and analysis. Mapping and visualization of thematic products related to water quality. These functional requirements serve as the foundation for designing the system framework and its functionalities. Given the substantial volume of multi-sensor Lake water parameters observation data and environmental aspect data involved in the system are accessed at the data management subsystem. It is accountable for integrating data to create a comprehensive data resource model analysis with missing sensor mode with more accuracy. This involves implementing hierarchical storage and unified management based on specific data classification and statistical model standards. The data management also provides hardware portable data interfaces with minimum computational experimentation within the overall system architecture.

Figure 4 shows the area site utilization for data retrieval and processed data for Multisensor fusion architecture on the undertaken Bay of Bengal basin. It includes remotely monitored GIS and satellite images, topography, and spatial-spectral signal data from on-site high-stream and mid-level Lake areas. Also, real-time multi-sensor data like monitoring turbidity, salinity, oxygen depletion, and chlorophyll distribution on the surface of the Lake sites. In-site data like previous research recorded data of any data type formats like numerical, images, texts, frequency signals, etc. On the other hand, sites should be constructed for impact assessment monitoring close to the shore next to the development whose influence is being investigated or at the mouths of tributaries and the outflow if the impact of sub-catchments is being addressed. In this instance, it is also advised that one station be situated at the Lake’s deepest point to monitor general conditions. A station’s ability to be deemed representative of an Lake relies on a variety of variables, including the goal of morphometry, sampling, stratification, Lake size, flow regime, and tributaries. Together, these variables dictate the number of monitoring sites required per Lake.

Site utilization of various datatypes undertaken for data modeling.

(c) Wireless Sensor Network (WSN): Upon node deployment, the system initializes the SD card, and real-time clock (RTC), and configures the GSM module for data transfer which can operate using solar energy. Subsequently, data collection begins through the onboard electrodes measuring temperature, pH, and dissolved oxygen (DO) after a brief delay. The gathered data is transmitted to a pre-designated mobile phone number or the base station via a GSM transceiver using solar power accumulator. Additionally, a copy of the data in the prescribed datatype accumulated in the base station is stored in the RTC for backup in case of a GSM signal outage. The system automatically stores data in the RTC, ready for transmission to the base station’s database during a GSM signal interruption. In case of an emergency, the administrator has the flexibility to adjust the 30-minute data collection period. Alternatively, the administrator can initiate data collection during this period by sending a server command via the administrator interface. This flexibility enhances sensor and sink node longevity and reduces maintenance costs associated with battery replacement, as the system can utilize energy from a secondary battery consuming backward energy from solar panel. Detected data from the various floating sensor nodes are transmitted remotely with high transmission speed to the server or a sink node. Long-distance communication relies on the GSM module, while short-distance communication uses off-the-shelf wireless-based transceiver modules with the ZigBee protocol for cluster or node-to-node communication. The primary goal for selecting ZigBee is to afford short-range efficient communication, and the central focus of this study is its low power consumption. To streamline installation and operation, the study adopts a basic ZigBee configuration, connecting sensor nodes to a sink node. An Arduino GSM module (GSM Shield 2 portable mobile antenna) is employed for the sink node to establish a wireless connection with the base station server. LoRaWAN complements laboratory methods by providing continuous monitoring and early detection of trends or anomalies in water quality. However, laboratory methods provide higher precision and can be used to validate the sensor data. LoRaWAN is a more cost-efficient option for continuous or large-scale monitoring, while laboratory methods are reserved for high-accuracy measurements that are periodically necessary to ensure the sensors are calibrated correctly and provide accurate data. LoRaWAN technology extends the spatial and temporal resolution of water quality monitoring, complementing traditional methods that offer high precision at specific points but cannot cover wide areas or continuous data collection. LoRaWAN provides real-time data that can enhance computational models by supplying continuous updates. This integration improves the accuracy and responsiveness of predictive models, making them more useful for managing dynamic systems, particularly during events such as heavy rains or inflow surges.

(d) Database preparation: Numerous Lake databases have been established, and database technology has been extensively used in the field of Lake Management, Lake research, and Lake exploitation in India. The data, content, and study findings have not, however, been developed into an information platform.

Table 3 shows the various datasets associated with possible data houses of effective Lake water quality monitoring configured by the previous researchers. In particular, the Lake Water quality monitoring database seems to be still a challenging task that has impeded the growth of the Coastal Ecological Economic Zone and influenced researchers’ and decision-maker’s assessments of its expansion trend and difference tendency. Security of the water environment is a key element of the Bay of Bengal Lake’s current ecological plan. The management of such a huge amount of monitoring data is challenging. A key step in protecting the water environment is flexible and efficient management. The purpose of the Lake Water quality monitoring database is to gather, manage, analyze, and provide data related to maintenance and management sequences as per available datasets, inflow pollution investigation data, area pollution investigation data, and oncomelania infection monitoring data. Since all of the data are related to particular locations, geographical visualization of the monitoring data makes it easier statistically to create the Lake Water ecological zone.

System design and development are based on real-time sensor technology, frequency spectrum signals, GIS theory, and satellite photos31. Due to the large volume of historical data included in the processing platform gathered over a prolonged period, a data objective correlation function has been added to enhance data administration and analysis. As a result, the proposed method during the system development and implementation process, functionalities including spatiotemporal query, Data decoding, forecasting analysis, feature extraction, and theme graph display are offered. The proposed design with a self-adaptive strategy is built to operate in a modern environment created by an online server engine. The effectiveness of the system and the efficiency of development are greatly enhanced by the efficient and adaptable Hadoop Framework.

(e) Sampling point finalization: A grid distribution approach is utilized to build up the monitoring stations to fully depict the condition and features of the water quality in the Lake. There are regularly spaced monitoring locations across the high-stream and mid-level Lakes. Each grid has an east-south length of around 4.87 km and a north-to-west width of over 2.89 km. From Fig. 5, the proposed architecture receives data from the available sampling nodes of three sampling stations: (S1), (S2), and (S3). It self-tunes its sampling rate and localization of sampling points based on the analysis of the framework rank-based features with spontaneous increasing and decreasing distribution. The grid spans around 12.4 \(km^2\). In the interim, further sample sites are installed in the main channel and border zones to assess the degraded state of water quality. This approach places 4056 sampling nodes on the Lake Water planar graph, indexed from north to south (62 in the High-stream Lake, 49 in the Mid-level Lake). To assess the pollution impacts of Lake Water caused by the influx of rivers and to filter substantial pollution levels and significant complex pollutants, 36 sample locations of river estuaries are set.

Surface plot of the sampling point localization and processing rate.

The process of inverting subsurface irradiance reflectance (R-) to obtain the coefficients xi is executed through a separate Grid distribution module. This module employs a downhill simplex algorithm to adjust the spectra modeled and input image spectra after atmospheric correction. The algorithm initiates with a predefined set of initial concentrations. The spectrum modeled for these concentrations is linearly adjusted to align with the input spectrum, resulting in an initial estimation of concentrations. Subsequently, this estimation undergoes optimization through two iterations of Q-factor correction and retrieval of water constituents.

(f) Sampling point addition and screening: It is planned to conduct on-site inspections at each sampling location due to the extraordinarily varied geography and land use conditions of the Bay of Bengal Lake. Some spots are identified to be in settlements, farming platforms, or fishponds, while others are challenging to acquire on-site data32. All of these spots have been eliminated since they do not belong in the range of the transferred water. Figure 5 represents the surface plot of the sampling point distribution concerning the sampling rate by the architecture per cycle. To better assess the pollution state of the Bay of Bengal Lake’s water quality, four more locations were added to the sampling plan for the first trial based on water quality evaluation findings. In the meanwhile, the in-Lake sample point set was expanded by two. Then, for the third and fourth trials, three additional estuarine sampling sites were added, bringing the in-Lake sampling nodes increment to 89 and the accumulated estuary river counts to 42.

(g) Database management and security: The database’s primary uses are for gathering, maintaining, analyzing, and displaying water quality monitoring data, as well as for surveying in-Lake pollution and less-impact source pollution in the Lake Water area. The elements that must be taken into account while monitoring the geographical distribution of Lake water quality are made clear and a planning approach for this monitoring is presented from the perspectives of identifying monitoring locations followed by monitoring progress33. It offers a collection of useful technological methods for planning spatial distribution monitoring in other comparable Lakes can also be found by using the proposed architecture. Data security enhancement by using advanced encryption and data decoding with maximum bit processing with precise resolution. Only administrative official users can log in to the system to examine the water quality level based on choosing the datatype availability (Like text, image, numbers, signals, frequency spectrum, position map data points).

The proposed model categorization is determined by the kind of input data that has been processed and the information that has been produced as a result of the system. In a nutshell, the proposed data-driven statistical model entails the categorization of extracted features at the medium level, the low level configures the nature of the input signal, and the high level correlates the decision-making based on symbols and sub-decisions34. The multisensory data must be synchronized and adjusted before the fusion process due to the sample characteristics of various types of sensors, such as sampling rate. At this level, data fusion has been effectively accomplished using statistical estimating approaches.

Multisensor data fusion model with Kalman filter for Lake water quality assessment.

The weighted average approach and the least squares method are examples of non-recursive estimating techniques that are often only used to combine redundant data35. Otherwise, additional fusion-related uses may be found in the Kalman filter (KF) under recursive estimating techniques. Creating multi-variable data sampling nodes with greater quality and more characteristics provides a better computation on the feature availability, a method known as “pixel-level fusion” represented for datatype which undergoes image processing performed directly on the architecture by pattern pixel from each sensor processing state vectors. At this level of data fusion, features from signals and pictures are combined. Accumulating various categories of data sets of several parameters received from many sources enables the fusion, producing a feature with more discrimination36. Techniques for data categorization are very well suited for feature fusion. Once statistics features are processed, data is handled as symbolic representations pointing descriptive level of weightage matrix u*v, like human descriptions. High-level fusion will be involved when combining symbols on a single state process with an accompanying measure of uncertainty into a final decision of forecasting level. Decision fusion is another name for the symbol fusion. Figure 6 shows the Kalman filter implemented feature extraction for multi-sensor fusion architecture. In the proposed method, by undertaking various basin sampling nodes at the Lake Water region, initially single parameter such as chlorophyll rate is examined along with its influence by surface water elevation. Later the model is fed with multiple parameters such as precipitation, temperature, and Evaporation rates to examine the performance of Markov features extraction. Popular engineering applications for the Kalman Filter (KF) include target tracking, extracting missing datasets, mismatching data detection, and computer vision. Given that the Kalman filter is mainly involved during the state update and predictor rate estimating phase and the fusion model will be multiplexed with system and sensor error along with white Gaussian noises, it can offer an optimum estimation in terms of statistics37.

Data integration and synchronization:

Multisensor data fusion: Data from different sensors (e.g., pH, turbidity) are combined to create a comprehensive picture of water quality. Sensors may operate at different frequencies or time intervals, so synchronization ensures that the data is aligned before fusion.

Categorization: Data is categorized into different levels, such as raw sensor readings, pre-processed values, and fused outputs. This hierarchy helps manage data complexity and improve accuracy.

Adaptive management: The model can adapt to changing conditions, such as variations in sensor performance or environmental factors, making it a flexible tool for water quality assessment.

Computational efficiency and uncertainty management:

Recursive estimation: The Kalman Filter updates estimates iteratively, which makes it computationally efficient and well-suited for real-time applications.

Handling uncertainty: The filter uses statistical techniques to account for noise and uncertainties in sensor data, improving the reliability of water quality estimates.

Limitations:

Linear assumptions & mathematical complexity: The Kalman Filter assumes linear relationships between variables and Gaussian distributions for measurement errors. This can limit the model’s applicability in complex scenarios where relationships are non-linear or sensor data is inconsistent.

Consider the state-space model of the following linear dynamic system with several sensors:

In Eqs. (3) & (4), k plays a vital role by pointing sampling time index, where in f(x) dicates the previous state vector, g(x) represents the undertaken measurement vector formulating the state transition model as U(x). Since the control strategy is incorporated with the transition model V(x) serving as the control input model along with the primary observation model as W(x) that transforms the true state space matrix into the objective score function38. To resolve covariance matrices n(x) with Gaussian noise taken while zero-mean and u(x) the prediction phase generates a projected state at each time step with expected estimate covariance provided by Eqs. (5) & (6).

From Eqs. (5) & (6), the Kalman filter starts to extract the Markov features of the covariance matrix along with the predictor rate index denoted by S(x) for each sampling sequence. During the examination phase, the projected Markov function \(k\hat (x|x-2)\) with two discrete step function delay holds the state vector of V(x)u(x) with higher probability features range. Since the Kalman filter gain matrix formulation is important in Markov feature selection execution, M(x) starts to decrease the unwanted datasets by averaging the near-mean score of each sampling set with a discrete interval of time to formulate the high-variance matrix group as represented in Eqs. (7), (8) and (9).

Due to the enhanced multi-variable matrix operations for estimate, the fundamental benefit of the KF is its computational efficiency. The architecture fusion model dynamics are assumed to be linear followed by initial uncertainty is assumed to be Gaussian distribution. The method refers Kalman filter to linearize the framework by reducing the misalignment in the datasets by expanding the Taylor series around an unstable processing sampling nodal point39. The infeasibility of using KFs non-distributed tracking fusion issues stems from the possibility that the measurements from many sensory sources may be inconsistent and the dependent assumption of KFs will be given feedforward with probability covariance attained level. The easy-to-implement linear combination of state vectors on the overall Markov feature extraction to find the final output feature where all weights are nonnegative integers and sum to a recursive state of 1 with an affordable gain matrix is expressed in Eq. (10)40.

Recently, the efficiency of adaptive filters for data preprocessing and mining has been enhanced using covariance combination methods. The covariance intersection is one of the covariance-based fusion techniques that draws the most attention from researchers interested in decentralized forms of data fusion. It fixes the fusion state similarity with the Kalman filter and improves when sensory input collected from several sensors is not independently related by Eqs. (8) and (9). Any option is guaranteed to yield an estimate that is consistent and whose covariance p is larger than or equal to its true squared error representing \((1-x)Q^{-1}b\). No existence of mesh between sample rates how closely the two estimates are correlated, the fused estimate P is certain to be accurate. Data management, multi-parameter uncertainty, and dynamic equibalance system modeling are all problems with Multisensor fusion. They result in innate uncertainties on the in-site localization of sensory data, which are brought on by device roughness as well as internal system noise sources and the sensor itself. Multisensor fusion solutions should be able to effectively handle these uncertainties and produce a consistent impression. Practically, data modeling through fusion necessitates a multidisciplinary understanding of signal processing, artificial intelligence, probability, and statistics.

Proposed statistical multisensor fusion model objective score on the analysis.

In water quality monitoring, each environmental factor (e.g., chlorophyll concentration \(C_t\), surface water elevation \(H_t\)) can be modeled as a state in a Markov process. The transition probabilities between states help estimate future values based on current measurements, capturing temporal patterns and dynamics. The state transition matrix \(T_t\) is essential in Markov models, representing the probabilities of moving from one state to another. For a discrete set of states, the transition matrix \(T_t\) is \(T(S_1- -\)>\(S_2, S_2- -\)>\(S_t\) Where \(T(S_1- ->S_2)\) is the probability of transitioning from state \(S_1\) to state \(S_t\). This matrix helps predict future water quality states by identifying dominant transitions or patterns. Let \(X_t\) represent the state of chlorophyll concentration at time t, which is influenced by factors like nutrient inflow, temperature, and light availability. The Markov model can be used to predict \(X_t- ->X_{t+1}\), the chlorophyll concentration at the next time step, using the transition probabilities from the current state \(X_t\). By extracting these Markov features, you can model how different states of water quality transition over time, capturing underlying ecological dynamics that affect factors like eutrophication or algal blooms.

In water quality monitoring, eigenvalues can provide insights into how quickly a system responds to changes and how different environmental factors interact. An eigenvalue estimation involves solving the characteristic equation: \(det(A - \lambda I)=0\). The eigenvalues \(\lambda\) provide key insights:

Magnitude of \(\lambda\): If |\(\lambda\)|<1, the system is stable and tends to return to equilibrium after a perturbation. If |\(\lambda\)| >1, the system is unstable, potentially leading to rapid changes in water quality.

Direction of eigenvectors: Eigenvectors correspond to the directions in which the system evolves. For example, in water quality, different eigenvectors might represent dominant patterns of nutrient cycling or temperature gradients.

In the Kalman filter framework, the state estimation matrix can be derived through its eigenvalues to determine how sensitive the system is to measurement errors or noise. This is particularly useful for evaluating the robustness of predictions. Consider the matrix A representing the interactions between water quality parameters (e.g., chlorophyll concentration, temperature, dissolved oxygen). Eigenvalue estimation can help determine:

Whether the system is stable under small perturbations (e.g., sudden changes in nutrient levels).

How quickly the water body will respond to external factors such as precipitation or pollution events.

If one eigenvalue dominates (i.e., is substantially larger than others), it could indicate that a particular environmental factor (e.g., temperature) is driving most of the changes in the water system.

Combining Markov feature extraction and Eigenvalue estimation: In the usage of the Kalman filter used for multisensor data fusion, the combination of Markov feature extraction and eigenvalue estimation helps improve the accuracy of predictions and robustness of the system. Markov Feature Extraction models the temporal evolution of water quality parameters, capturing how they transition between states. Eigenvalue Estimation evaluates the stability and sensitivity of the system, ensuring that the Kalman filter is tuned to provide reliable estimates even in the presence of noise or uncertainty. Markov feature extraction provides a model for the dynamics of the water system, which feeds into the Kalman filter as a prediction model. Eigenvalue estimation ensures the stability of the Kalman filter’s state updates, particularly when new sensor data (which could be noisy or uncertain) is incorporated.

When multisensory fusion and integration are used, and more sensible information is made accessible through parallel pipeline process states of sensory input. The attained objective score through the fusion model is represented in Fig. 7. This merging function outputs vector \(G_i\) for each element as follows: if \(Q_i\) and \(M_i\) coincide, then \(P_i\) takes the same value; otherwise, it chooses \(Q_i\) with probability p or \(M_i\) with probability \((1-p)\). When the solution’s characteristics produce excellent accuracy, they are considered to be significant.

The positions of the sampling datasets are chosen at random. Equations (13), (14) and (15) depict how the fusion model’s properties vary based on the database sampling node’s availability represented as

In contrast, a feature is considered noisy if it produces accurate predictions while not being a component of the solution. Using this technique, a new solution can make use of more important aspects while ignoring the distracting ones. The influence of randomness may be reduced, though, by adjusting the p-value, preventing the algorithm from settling on local Markov features shown in Fig. 8.

Assigning weights to sensor measurements: Weights determine how much influence each sensor’s measurement has on the final estimate. They are crucial for combining sensor data in a way that accounts for varying levels of reliability and accuracy. In the Kalman Filter framework, weights are implicitly derived from the covariance matrices of the state estimate and measurement noise. The filter updates the estimate based on the relative uncertainties of predictions and measurements. he Kalman Gain K determines how much weight is given to the new measurement compared to the prediction. Sensors with lower measurement variance (higher reliability) receive higher weights. This approach assumes that the variance of each sensor’s measurements is known or can be estimated. The accuracy of weighted measurements is influenced by sensor calibration, precision, environmental conditions, and the assumptions made during weight derivation. Accurate weight assignment ensures that the fused measurements reflect the true state of the parameter being monitored, leading to better decision-making and system performance.

Markov feature function average estimated rate on the quality analysis.

The normalized fusion model data predictor and estimator were correlated with each other to estimate the distribution of the fitted values by ignoring Gaussian noise as given in Fig. 9. The correlated Markov features get the best fit with the residual of minimum statistics without maximum deviation on the model estimator paths. The full feature vectors are ordered in decreasing order of accuracy on each iteration. The feature’s initial component serves as the next solutions that finally become local Markov features. The feature space’s remaining vectors are all classified as feature extractors. Integer variables l and n have preset values. The Markov features adjust their location or position about the positions of the sampling datasets to enhance the statistical model precision and robustness.

Markov feature extracted residual distribution on the fusion model predictor.

The Solar-Powered Integrated Multi-Sensor System for lake water quality monitoring uses a combination of statistical data fusion and Kalman Filter techniques to select and process relevant data. The system incorporates diverse water quality sensors, including those for chlorophyll, dissolved oxygen, pH, and nutrients, to monitor different aspects of water health. Sensor accuracy is maintained through calibration, cross-validation, and the continuous adjustment of sensor weights by the Kalman Filter, ensuring reliable estimates of water quality parameters. The system typically uses a variety of water quality sensors, each designed to measure specific parameters:

Optical sensors for chlorophyll and turbidity.

Electrochemical sensors for pH, dissolved oxygen, and conductivity.

Temperature sensors for water temperature.

Nutrient sensors for measuring levels of nitrogen or phosphorus.

The accuracy of sensors is critical for reliable water quality assessment. Chlorophyll sensors may have an accuracy of ±5% for concentration measurements. Dissolved oxygen sensors may have an accuracy of ±1% saturation. The system adapts based on environmental changes such as seasonal variations, precipitation, or temperature shifts. For example, during the summer, when water temperatures rise, the system may place more weight on dissolved oxygen and nutrient levels as they are closely linked to the risk of eutrophication. The system knows which parameters are critical for water quality based on predefined environmental models and scientific knowledge of aquatic ecosystems. It uses historical data, real-time sensor inputs, and domain-specific thresholds (e.g., safe limits for DO or pH) to decide which variables are most relevant in assessing water quality.

Present research has been done on the adsorption and release rates relationship configuring water and sediment pollutants with the aim of better monitoring and predicting the water quality pollution of Lake water classifying the deposition, migration, and transformation scales of pollutants. It will assist in determining the origins and causes of the sediment contaminants in Lake Water as well as the likelihood that they may contaminate the water. Sediment samples are collected as closely as possible to the water quality monitoring locations, with the least amount of vessel disturbance to the sediment41. Evaporation point, Precipitation, temperature, surface water elevation, and chlorophyll existence rates can be detected in sediment. The homogeneous distribution of those 32 in-Lake sample stations may reflect the variations in regional sediment.

All boundary nodes sampling water quality monitored data that passed quality checks were subjected to Eq. (16). Increasing Markov feature extraction ratio regions of interest were used to obtain the spatially correlated vector states of chlorophyll surrounding entire ins-site sampling nodal points. The median and interquartile statistical dispersion measure ensures that is equal to the difference between the 5th and 95th percentiles for a selection of dataset samples to formulate the final weighted output on the multi-variable inputs.

To create composite maps that automatically associate the feature nodes provided by the Kalman filter with Chlorophyll concentrations frequently near the median score, the category assignments of each feature were tallied over all scenarios to sustain maximum objective score limits. To validate the proposed model, the chlorophyll datasets are fed into the Multisensor fusion model architecture to estimate the average and forecast the concentration trends to their maximum deviation in the upcoming months is given in Fig. 10. Though feature sampling node data are at different locations of Lake regions and different data types, once they enter the main frame structure data are preprocessed with local Markov features by Kalman filter to centralize the data for the feature predictor phase. The sample rates are continuously taken with a discrete time interval of mid-noon each day and executed for 30 days to examine the model efficiency. Since the chlorophyll analysis is based on the elevation level of the water, the proposed model starts to fetch the month-wise water elevation historical datasets from the Lake water quality data warehouse with various category bands.

Predicted chlorophyll concentration on the mid-level Lake region using a fusion data model.

Water shortage: Extreme low inflows can lead to stress on aquatic ecosystems, reducing biodiversity and possibly leading to conditions that favor harmful algal blooms (HABs), which may drastically increase chlorophyll concentrations.

Short retention time (High inflow): During periods of high inflow, the retention time decreases, and water may flush through the system more quickly. This can limit the time phytoplankton have to grow and reduce chlorophyll concentrations, despite high nutrient loads.

Low water elevation: During low water periods, stratification is more likely to break down, promoting vertical mixing of water layers. This mixing brings nutrients from deeper layers to the surface, where they are accessible to phytoplankton, promoting higher chlorophyll concentrations. It is concluded that both minimal and maximal points of water elevation create conditions that affect nutrient availability, water mixing, and ecological balance, making them important in predicting chlorophyll concentration.

The surface water elevation was used to calculate the quantity of rainfall received and runoff discharge, sediment accumulation, and nutrient discharge from the Lake Water basin as average estimates along with Lake evapotranspiration. The estimated inflows were then applied to the stage-volume connection of temperature effects on the water surface, which has a significant impact on the chlorophyll content, using regression analysis to determine the volume of the Lake’s outflows. From Fig. 11, it is clearly understood that the extreme inflow band holds the maximum limit of 5.5 m, the middle inflow band holds 3.6 m and the water shortage band accumulates only 1.98 m surface elevation pointing to chlorophyll concentration distribution is maximum in the 128 \(\textrm{mg}/\textrm{m}^3\) in the middle inflow region as compared with extreme inflow band holds only 86 \(\textrm{mg}/\textrm{m}^3\) respectively. In the initial examination of a single parameter in the proposed statistical model, chlorophyll concentration is predicted followed by extracting the surface water elevation automatically without any human decision-making influence with automatic retrieval of data from the created dataware house for accurate forecasting at different bands. On the other hand, the second trial is executed by fixing the multiple data availability to the system in a single interval of time to validate the integrated multi-sensor fusion model computation. Multisensor parameters like evaporation rate, precipitation, Temperature, and Dissolved Oxygen consecutively. The fusion model with Kalman filter gives the most efficient Markov features as box and whisker layout of Multisensors data samples as a single feature frame with 7.8 ms computation time as given in Fig. 12. An adaptation of the box and whisker plot places a constraint on the whisker length, limiting it to a maximum of 1.5 times the interquartile range. In other words, the whisker extends to the furthest value from the center while remaining within a distance of 1.5 times the interquartile range from the lower or upper quartile. Any data points outside this range are depicted as individual points on the graph and are considered potential outliers. In the distribution, denoted as C, the center of the data (median of 0.75) is the highest among the three distributions. Distribution C exhibits negative skewness, evident from the longer whisker and half-box on the left side of the median compared to the right side.

Self-adaptive estimation of surface water elevation to predict the chlorophyll concentration.

Box and whisker representation of extracted Markov features of fusion model involving multiple sensors data.

More evaporation will occur as a result of higher temperatures, which will probably result in lower water levels in the Bay of Bengal Lake. Winter evaporation is boosted by reduced Lake ice, which adds to the reduction. Water levels in the Lake are predicted to drop by the end of the century by no more than 0.9 feet under a lower emissions scenario, but between 1.2 and 1.9 feet under a higher emissions scenario. The possibility of a larger decline in Lake levels increases with temperature rise. When regular variations are added, even a 0.9-foot drop can cause a substantial lengthening of the distance to the shoreline in many locations. Further, other possible effects that might affect beaches, coastal habitats, and dredging infrastructure needs as represented in Fig. 12. The evaporation rate lies in the range of 1.2–11.9mm/day, precipitation varies from 6 to 18%, dissolved oxygen distribution occurs at a rate of 3–8 mg/L and the temperature starts from 8 °C reaches up to 27 °C. Increased evaporation due to higher temperatures will likely result in a decrease in Indian Precipitation, which is expected to increase in the winter and spring and intensify throughout the year. It is anticipated that this trend will cause greater floods, more infrastructure damage, and effects on human health. Such intense downpours increase the danger of waterborne infections by overtaxing drainage systems and water treatment facilities. Based on the Markov feature extraction weightage, the temperature is the most prevalent factor associated with the other three subgroups, hence autocorrelation statistics initiated by the model are given in Table 4.

Non-parametric tests necessitate the absence of internal correlation within the data. This requirement is essential as internal correlations can impact data analysis and the significance of trend statistics. Hence, the influence of auto-correlation was taken into account in the trend analysis of the entire time series data. Despite accounting for seasonal variations, numerous noteworthy alterations in water quality persist, underscoring the contribution of additional factors to changes in water quality. Additionally, in the second test of the proposed strategy, equal weight is given to the indications for extremes in temperature and precipitation as estimated in Table 4. Due to the significance of cryospheric processes in the hydrological regimes of these upstream basins, greater weight has been given to the model’s ability to simulate the entire annual cycle of precipitation and temperature along with equal weight for air temperature and precipitation extremes. Changes in temperature and precipitation have a synergistic impact that drives changes in these cryospheric processes. Depending on which climatic variables are more significant in each specific situation, the weighting might be changed for each of these concerns to account for findings from other research. Markov features with eigenvalue estimation are given in Table 5.

Once the model identifies the first weightage features, it starts to rank the next feature through variance by extracting the sum of squared loading holding cumulative of 54.383 for temperature and 47.12 for evaporation respectively by following principal component analysis as shown in Table 5. Table 6 represents the undertaken parameters and posterior distribution rates for pairwise correlation.

A certain amount of subjectivity is introduced by the weighting of various parameters like evaporation, precipitation and dissolved oxygen are represented in Fig. 13. When evaluating a model’s historical performance, gives the ability of the model to simulate the complete year cycle of precipitation and temperature greater weight than the ability of the model to simulate the climatic variations. In addition, while some models share the same model code or utilize the same forcing and validation data, creating model interdependency, our methodology believes that all model runs are independent. This emphasizes the need to exercise caution when using the existing climate change forecasts to formulate climate change adaptation measures. Climate change projections of rising temperatures, quicker evaporation rates, and longer-lasting droughts would simply put further strain on already overworked water sources. The Great Plains’ current water consumption is unsustainable because the High Plains aquifer is being depleted at a pace that is higher than its rate of replenishment. The susceptibility of natural ecosystems to pests, invading species, and the extinction of native species is anticipated to be further increased by the interaction of climate-driven changes with other human-induced pressures. Changes in temperature and precipitation influence the ecologies, water and food supplies, and habitat availability of native animals and plants, which in turn alters their composition and variety.

Correlation of evaporation with dissolved oxygen and precipitation at the low, mid, and high band rate of 2, 7, and 11 mm/day.

By modeling these temporal transitions, the system can predict how increased precipitation (due to climate change) might lead to temporary spikes in water levels, which in turn can influence nutrient distribution, oxygen levels, and algal growth. This connection provides insight into how water quality might shift under future climate scenarios, based on the probabilistic state changes captured by the Markov model. As climate change induces more extreme weather events (e.g., heavy rainfall), Markov feature extraction can be used to predict how these changes will influence surface water elevation and the resultant effects on chlorophyll concentration. Understanding these transitions helps in assessing future ecological risks. If eigenvalue estimation reveals that the system’s sensitivity to changes in surface water temperature is increasing, this could signal a greater risk of harmful algal blooms as a result of climate-induced warming. This understanding links the broader climate change influence directly to the core goal of predicting water quality parameters like chlorophyll concentration. The dynamic nature of the Kalman filter allows the model to handle the increasing variability caused by climate change. As new data (e.g., temperature spikes, water level changes) is fed into the system, the Kalman filter can adjust its estimates of water quality parameters, ensuring that climate-induced fluctuations are accounted for in real-time predictions.

Handling uncertainties during natural disasters: Natural disasters such as heavy rains and floods, particularly common in regions like India, introduce substantial uncertainties into dynamic aquatic systems. These events can drastically alter water inflow rates, turbidity, nutrient availability, and solar radiation, all of which are key factors in predicting chlorophyll concentrations. To ensure that system outputs remain usable even under such extreme conditions, we have implemented several strategies:

Real-time monitoring and adaptive models: By incorporating real-time data on water inflow, turbidity, and solar radiation, we can dynamically adjust our models to reflect sudden changes in environmental conditions. This approach allows for more accurate predictions during extreme events such as floods.

Scenario-based modeling: We have developed scenario-based models that simulate the influence of extreme events. These models adjust system outputs based on predefined thresholds for inflow rates, solar radiation levels, and water quality metrics, ensuring that the predictions remain reliable even during natural disasters.

Probabilistic and risk assessment approaches: Given the uncertainty associated with natural disasters, we have adopted probabilistic modeling techniques to account for a range of potential outcomes. Confidence intervals and risk assessments provide decision-makers with a clearer understanding of the system’s behavior during and after such events.

By incorporating these strategies, the system’s outputs will not only remain usable during extreme weather events but will also provide actionable insights into the resilience and recovery of aquatic ecosystems following natural disasters. This ensures that the research remains relevant in the context of climate variability and extreme hydrological events.

By using a Lake-specific parameterization, this present study validates the broad applicability of Multisensors integrated data fusion modelling for autonomous, operational processing. The present work infers that the concentrations of chlorophyll exhibit a closely inverse relationship with the transparency of Lake water: Low chlorophyll concentrations coincide with high transparency, and vice versa. This observation indicates that the segments of Lake water monitored within the test area are predominantly influenced by algae as the primary optical water constituent affecting water clarity. The absence of oxygen at the lake bed hinders the biodegradation of organic sediment, emphasizing the importance of maintaining oxygen levels for beneficial bacteria and insects. The current review highlights that nutrient loading has surpassed the eutrophic condition, resulting in a hyper-eutrophic status. The hydraulic residence time of a lake, representing the time required to entirely replace all water through river, groundwater, and rainfall inputs, plays a significant role in shaping the chemical composition of lake waters. It influences the duration available for biogeochemical and photochemical processes, the accumulation and loss of dissolved and particulate materials, as well as the duration of biogeochemical interactions with lake sediments and the littoral zone. Given the growing operational demands of water quality monitoring, enhancements to the system are imperative in future aspects: 1) Incorporating additional satellite remote sensing data; 2) Implementing warning and forecasting capabilities for water quality; 3) Introducing advancements in data assimilation technology.

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Pu, F., Ding, C., Chao, Z., Yu, Y. & Xu, X. Water-quality classification of inland lakes using landsat8 images by convolutional neural networks. Remote Sens. 11, 1674 (2019).

Article ADS Google Scholar

Budiarti, R. P. N., Tjahjono, A., Hariadi, M. & Purnomo, M. H. Development of iot for automated water quality monitoring system. In 2019 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), 211–216 (IEEE, 2019).

Jiang, J. et al. A comprehensive review on the design and optimization of surface water quality monitoring networks. Environ. Modell. Softw. 132, 104792 (2020).

Article Google Scholar

Jan, F., Min-Allah, N. & Düştegör, D. Iot based smart water quality monitoring: Recent techniques, trends and challenges for domestic applications. Water 13, 1729 (2021).

Article CAS Google Scholar

Wang, X. & Yang, W. Water quality monitoring and evaluation using remote sensing techniques in China: A systematic review. Ecosyst. Health Sustain. 5, 47–56 (2019).

Article Google Scholar

Thiemann, S. & Kaufmann, H. Lake water quality monitoring using hyperspectral airborne data—a semiempirical multisensor and multitemporal approach for the mecklenburg lake district, germany. Remote sens. Environ. 81, 228–237 (2002).

Article ADS Google Scholar

Rand, J. M. et al. The human factor: Weather bias in manual lake water quality monitoring. Limnol. Oceanogr. Methods 20, 288–303 (2022).

Article Google Scholar

Devane, M. L., Moriarty, E., Weaver, L., Cookson, A. & Gilpin, B. Fecal indicator bacteria from environmental sources; strategies for identification to improve water quality monitoring. Water Res. 185, 116204 (2020).

Article CAS PubMed Google Scholar

Vasistha, P. & Ganguly, R. Water quality assessment of natural lakes and its importance: An overview. Mater. Today Proc. 32, 544–552 (2020).

Article CAS Google Scholar

Sun, X. et al. Monitoring water quality using proximal remote sensing technology. Sci. Total Environ. 803, 149805 (2022).

Article CAS PubMed Google Scholar

Alam, A. U., Clyne, D., Jin, H., Hu, N.-X. & Deen, M. J. Fully integrated, simple, and low-cost electrochemical sensor array for in situ water quality monitoring. ACS Sens. 5, 412–422 (2020).

Article CAS PubMed Google Scholar

Axler, R. et al. Water quality issues associated with aquaculture: A case study in mine pit lakes. Water Environ. Res. 68, 995–1011 (1996).

Article CAS Google Scholar

Wu, Y., Dai, R., Xu, Y., Han, J. & Li, P. Statistical assessment of water quality issues in Hongze lake, China, related to the operation of a water diversion project. Sustainability 10, 1885 (2018).

Article Google Scholar

Huang, G. H. & Xia, J. Barriers to sustainable water-quality management. J. Environ. Manag. 61, 1–23 (2001).

Article CAS Google Scholar

Sagan, V. et al. Monitoring inland water quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing. Earth-Sci. Rev. 205, 103187 (2020).

Article CAS Google Scholar

Madeo, D., Pozzebon, A., Mocenni, C. & Bertoni, D. A low-cost unmanned surface vehicle for pervasive water quality monitoring. IEEE Trans. Instrum. Meas. 69, 1433–1444 (2020).

Article ADS Google Scholar

Yigit Avdan, Z., Kaplan, G., Goncu, S. & Avdan, U. Monitoring the water quality of small water bodies using high-resolution remote sensing data. ISPRS Int. J. Geo-Inf. 8, 553 (2019).

Article Google Scholar

Simitha, K. & Raj, S. Iot and wsn based water quality monitoring system. In 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), 205–210 (IEEE, 2019).

Olatinwo, S. O. & Joubert, T.-H. Enabling communication networks for water quality monitoring applications: A survey. IEEE Access 7, 100332–100362 (2019).

Article Google Scholar

Imran, L. B., Latif, R. M. A., Farhan, M. & Aldabbas, H. Smart city based autonomous water quality monitoring system using WSN. Wirel. Person. Commun. 115, 1805–1820 (2020).

Article Google Scholar

Chen, C. et al. Assimilating multi-source data into a three-dimensional hydro-ecological dynamics model using ensemble Kalman filter. Environ. Modell. Softw. 117, 188–199 (2019).

Article Google Scholar

Javaheri, A. Assimilation of Multi-Sensor Data into Numerical Hydrodynamic Models of Inland Water Bodies. Ph.D. thesis, UCL (University College London) (2016).

Shao, S., Wang, Y., Liu, G. & Song, K. A systematic review of the application of the geostationary ocean color imager to the water quality monitoring of inland and coastal waters. Remote Sens. 16, 1623 (2024).

Article ADS Google Scholar

Mahlasi, C. Remote sensing of water quality in inland water bodies (University of Johannesburg (South Africa), 2017).

Singh, A. Dynamics of water mass variations in lake/reservoir dominated regions from multi-sensor Earth observation data and hydrological model outputs. Ph.D. thesis, Technische Universität München (2017).

Liu, W. Robust Multi-sensor Data Fusion for Practical Unmanned Surface Vehicles (USVs) Navigation. Ph.D. thesis, UCL (University College London) (2020).

Demetillo, A. T., Japitana, M. V. & Taboada, E. B. A system for monitoring water quality in a large aquatic area using wireless sensor network technology. Sustain. Environ. Res. 29, 1–9 (2019).

Article Google Scholar

Shareef, Z. & Reddy, S. Design and wireless sensor network analysis of water quality monitoring system for aquaculture. In 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 405–408 (IEEE, 2019).

Shaukat, N., Ali, A., Javed Iqbal, M., Moinuddin, M. & Otero, P. Multi-sensor fusion for underwater vehicle localization by augmentation of RBF neural network and error-state Kalman filter. Sensors 21, 1149 (2021).

Article ADS PubMed PubMed Central Google Scholar

Zhouhu, W., Jian, Z., Jie, Z., Jie, R. & Shan, C. A monitoring project planning technique of the water quality spatial distribution in Nansi lake. Proc. Environ. Sci. 10, 2320–2328 (2011).

Article Google Scholar

Zhang, Z., Fu, K., Sun, X. & Ren, W. Multiple target tracking based on multiple hypotheses tracking and modified ensemble Kalman filter in multi-sensor fusion. Sensors 19, 3118 (2019).

Article ADS PubMed PubMed Central Google Scholar

Sun, M., Davies, M. E., Hopgood, J. R. & Proudler, I. Adaptive kernel kalman filter multi-sensor fusion. In 2021 IEEE 24th International Conference on Information Fusion (FUSION), 1–8 (IEEE, 2021).

Kordestani, M., Dehghani, M., Moshiri, B. & Saif, M. A new fusion estimation method for multi-rate multi-sensor systems with missing measurements. Ieee Access 8, 47522–47532 (2020).

Article Google Scholar

Duan, X. et al. Multi-sensor fusion detection method for vehicle target based on kalman filter and data association filter. In Advances in Artificial Intelligence and Security: 7th International Conference, ICAIS 2021, Dublin, Ireland, July 19-23, 2021, Proceedings, Part II 7, 441–448 (Springer, 2021).

Priyanka, E. & Thangavel, S. Multi-type feature extraction and classification of leakage in oil pipeline network using digital twin technology. J. Ambient Intell. Human. Comput. 13, 5885–5901 (2022).

Article Google Scholar

Priyanka, E., Thangavel, S. & Prabhakaran, P. Rank-based risk target data analysis using digital twin on oil pipeline network based on manifold learning. Proc. Inst. Mech. Eng. Part E J. Process Mech. Eng. 236, 1637–1651 (2022).

Article Google Scholar

Osman, M. et al. Multisensor fusion localization using extended h\(\infty\) filter using pre-filtered sensors measurements. In 2019 IEEE Intelligent Vehicles Symposium (IV), 1139–1144 (IEEE, 2019).

Hu, F. & Wu, G. Distributed error correction of EKF algorithm in multi-sensor fusion localization model. IEEE Access 8, 93211–93218 (2020).

Article Google Scholar

Hao, G. & Sun, S. Distributed fusion cubature Kalman filters for nonlinear systems. Int. J. Robust Nonlinear Control 29, 5979–5991 (2019).

Article MathSciNet Google Scholar

Liu, Q. Intelligent water quality monitoring system based on multi-sensor data fusion technology. Int. J. Ambient Comput. Intell. (IJACI) 12, 43–63 (2021).

Article Google Scholar

Murugesan, E., Shanmugamoorthy, S., Veerasamy, S. & Velusamy, S. Groundwater hydrochemistry and its appropriateness for consumption and irrigation: Geographic and temporal variation: Integrated approach. Urban Climate 49, 101482 (2023).

Article Google Scholar

Download references

The APC is funded by Amrita Viswa Vidyapeetham, Bangalore.

This research was funded by Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bangalore, India.

These authors contributed equally: E. B. Priyanka and R. Mohanasundaram.

Department of Mechatronics Engineering, Kongu Engineering College, Perundurai, Tamilnadu, 638060, India

E. B. Priyanka & S. Thangavel

School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India

R. Mohanasundaram

Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Bengaluru, India

R. Anand

You can also search for this author in PubMed Google Scholar

Conceptualization, E.B.P., S.T., M.R., A.R.; methodology, E.B.P., S.T., M.R., A.R.; Software, E.B.P., S.T., M.R., A.R.; validation, E.B.P., A.R.; formal analysis, E.B.P., S.T., M.R., A.R.; investigation, E.B.P., S.T.; Resources, E.B.P., S.T., M.R., A.R.; data curation, E.B.P., S.T., M.R., A.R.; writing—original draft, E.B.P., S.T., M.R., A.R.; writing—review & editing, E.B.P., S.T., M.R., A.R.; visualization, E.B.P., S.T., M.R., A.R.; supervision, S.T., A.R.; funding acquisition, A.R.

Correspondence to R. Anand.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

Priyanka, E.B., Thangavel, S., Mohanasundaram, R. et al. Solar powered integrated multi sensors to monitor inland lake water quality using statistical data fusion technique with Kalman filter. Sci Rep 14, 25202 (2024). https://doi.org/10.1038/s41598-024-76068-8

Download citation

Received: 27 July 2024

Accepted: 10 October 2024

Published: 24 October 2024

DOI: https://doi.org/10.1038/s41598-024-76068-8

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative