Skip to content

Risk Scenarios & Failure Modes

Table of Contents

  1. Overview
  2. Scenario 1: Low Solar Conditions
  3. Scenario 2: WiFi Retry Overhead
  4. Scenario 3: Battery Aging
  5. Scenario 4: Extreme Heat
  6. Scenario 5: MT3608 Instability
  7. Scenario 6: Complete Solar Failure
  8. Combined Worst Case
  9. Mitigation Strategies
  10. Risk Matrix

Overview

No design survives perfect real-world conditions. The power model assumes ideal solar harvest, reliable WiFi, and stable component efficiency. But farms are dusty, storms reduce solar to zero for days, and Rajasthan heat is merciless. This section catalogs the failure modes that can occur when reality deviates from assumptions, quantifies the impact on autonomy, and prescribes mitigations.

Every scenario is modelled, not measured. Field validation is required before production deployment. The scenarios range from common (dust on solar panel, poor WiFi signal) to rare (battery thermal runaway, MT3608 voltage instability). For each, we show the impact on the energy budget, the symptoms you would observe, and the firmware or hardware fixes.


Scenario 1: Low Solar Conditions

Cause

Dust, pollen, or salt spray accumulates on the solar panel. Panel orientation is slightly non-optimal. Monsoon season clouds persist for days. Panel is aging (degradation is ~0.5%/year but can spike during heat stress).

Modelled Impact

Base assumption: 350 mAh/day solar harvest (clear dry season)

Dust buildup (10-14 days dust accumulation):
  → 70-80% of peak harvest remains
  → Effective harvest: 245–280 mAh/day
  → Energy margin: (250 mAh / 77.6 mAh) = 3.2× ✓ Still sustainable

Moderate clouds / partial monsoon:
  → 50% of peak harvest
  → Effective harvest: 175 mAh/day
  → Energy margin: (175 / 77.6) = 2.3× ✓ Still sustainable

Heavy clouds / thick monsoon:
  → 20% of peak harvest
  → Effective harvest: 70 mAh/day
  → Energy margin: (70 / 77.6) = 0.9× ⚠️ NEGATIVE MARGIN (battery drains)

Heavy dust + cloud combined:
  → 10% of peak harvest
  → Effective harvest: 35 mAh/day
  → Daily deficit: 77.6 - 35 = -42.6 mAh/day
  → Autonomy: 2720 mAh / 42.6 mAh/day = 64 days until critical ⚠️ Still survives

Observed Symptoms

Early warning (days 1–3): - Battery voltage drifts downward by 0.05–0.1V per day instead of staying flat - Solar input voltage (at charger input) drops below 5.0V (was ~6V on clear days)

Mid-term (days 5–10): - Battery voltage at end-of-day is lower than start-of-day - Status LED is dimmer (lower supply voltage to LED circuit) - WiFi TX sometimes requires retries (lower supply voltage causes marginal transients)

Critical (days 14+): - Battery voltage drops below 3.6V (down from nominal 4.0V) - WiFi TX fails frequently (BOOST voltage droops below 4.5V, supervisor pulls reset) - System reboots repeatedly every 1–2 hours

Mitigation

  1. Monthly panel cleaning: Wipe dust/pollen from panel surface with soft cloth or soft brush. Dry-season dust can blind a panel in 10 days.

  2. Seasonal profile switch: If entering monsoon season (June–September), switch from Profile B to Profile C (30-min TX interval, lower consumption). Firmware parameter can be changed remotely via WiFi config message.

  3. Enclosure ventilation: Reduce internal heat (see Scenario 4) to improve efficiency. A cooler enclosure allows battery and MT3608 to operate at higher efficiency.

  4. Panel angle adjustment: In winter (Jan–Mar), increase tilt angle by 5–10° to catch lower-angle sun. In summer (Apr–May), reduce tilt to avoid excessive heating of panel.

  5. Firmware alert: Log solar harvest voltage hourly. If daily average drops below 5V for 3 consecutive days, send alert to farm server. Operator can then manually check panel.

Recommendation: Commission each node with a solar irradiance sensor (pyranometer) co-located with the panel for the first week. This gives you a calibrated baseline for "350 mAh/day" and reveals if your site is systematically worse than the model assumes. If measured harvest is <250 mAh/day on clear days, switch to Profile C immediately.


Scenario 2: WiFi Retry Overhead

Cause

Poor WiFi signal strength (RSSI < -75 dBm), interference from other 2.4 GHz devices, or distance from the farm server's access point. WiFi association takes 3–5× longer; TCP packets are lost and retried.

Modelled Impact

Base case (Profile B, good signal):
  WiFi TX energy: ~23 mAh/day (96 TX per day, ~0.24 mAh per successful TX)

Poor signal case:
  WiFi association extends from ~500ms to ~2500ms (5× slower)
  Estimated retry overhead: 2–3 additional association attempts per TX

  New estimate: 96 TX × (0.24 mAh baseline + 0.15 mAh retry overhead) = 37 mAh/day

  New daily consumption: 18.2 (sleep) + 36 (reads) + 37 (WiFi) = 91.2 mAh/day
  New autonomy: 2720 / 91.2 = 30 days (vs. 31 days baseline)
  New energy margin: (350 / 91.2) = 3.8× (was 4.5×)
  Impact: -18% energy margin ⚠️ Noticeable but recoverable

Observed Symptoms

Early warning (session 1): - WiFi association takes >2 seconds (visible in serial log) - RSSI (signal strength) is < -70 dBm, maybe -75 or worse

Mid-term (recurring sessions): - Some TX events fail after 3 retries, data is buffered instead of sent - Buffer fills up (more than 3 readings queued, should transmit every 15 min) - Status LED blinks erratically during WiFi retry sequences

Critical (persistent): - TX failure rate >50% (lose half the data) - Battery drains faster than model (24-hour test shows >15% higher consumption than expected)

Mitigation

  1. Site survey: Use a WiFi scanner app (WiFi Analyzer on Android) to check signal strength at the planned node location. Target RSSI > -70 dBm.

  2. Antenna repositioning: Move the node antenna higher (tip of the mast), or relocate the farm server's access point closer to the field.

  3. External antenna: If the node is indoors or shielded, use an external SMA antenna instead of the PCB trace antenna (estimated +6 dBi gain possible).

  4. Firmware retry tuning: Reduce retry limit from 3 to 2, or increase backoff time (trade: higher chance of data loss, but lower energy on poor-signal days).

  5. Automatic profile switching: If WiFi success rate drops below 90%, firmware automatically switches from Profile B to Profile C, reducing TX frequency to 30 minutes. This buys time for manual intervention.

Recommendation: In commissioning, log WiFi statistics (RSSI, association time, retry count) for 24 hours. Compute average RSSI. If <-70 dBm, do not proceed to field deployment until antenna is improved. Poor signal is a silent killer of battery life.


Scenario 3: Battery Aging

Cause

Li-ion batteries degrade over time. After 1–2 years of daily charge-discharge cycles in a hot environment (Rajasthan summer), the NCR18650B can lose 10–20% of its capacity. Internal resistance increases, reducing the usable capacity further.

Modelled Impact

Fresh battery (day 0):
  Nominal capacity: 3400 mAh
  Usable capacity (80% derate): 2720 mAh
  Autonomy (Profile B, no solar): 2720 / 77.6 = 35 days

After 1 year (Rajasthan conditions, ~50°C average):
  Capacity loss: ~15% (summer heat accelerates degradation)
  New nominal capacity: 3400 × 0.85 = 2890 mAh
  New usable (80% of degraded): 2890 × 0.80 = 2312 mAh
  New autonomy: 2312 / 77.6 = 30 days
  Impact: -5 days autonomy ⚠️ Acceptable

After 2 years (continued heat stress):
  Capacity loss: ~25–30%
  New nominal capacity: 3400 × 0.72 = 2448 mAh
  New usable: 2448 × 0.80 = 1958 mAh
  New autonomy: 1958 / 77.6 = 25 days
  New energy margin (350 mAh solar): (350 / 77.6) = 4.5× still positive
  Impact: -6 days autonomy, but system still sustainable ✓

Observed Symptoms

Year 0–1: - Battery voltage at end of sunny day is 4.05V (was 4.1–4.15V) - No visible change in operation

Year 1–2: - Battery voltage drops to 3.95V end-of-day (aging sign) - On cloudy days, battery dips to 3.5V (was 3.6–3.7V) - WiFi TX occasionally fails due to lower supply voltage

Year 2+: - Battery voltage barely reaches 4.0V (clear sign of internal resistance increase) - Voltage droop during WiFi TX is larger (e.g., 0.3V drop, visible on oscilloscope) - On cloudy stretches (>3 consecutive cloudy days), system can't sustain itself

Mitigation

  1. Battery swap schedule: Design for 18–24 month battery lifespan in Rajasthan heat. Plan to replace batteries every 18 months during routine maintenance.

  2. Voltage trending: Log battery voltage daily. Plot over time. When voltage trend goes negative for >1 week (excluding seasonal variation), it signals aging. Flag for replacement.

  3. Coulomb counting: Advanced mitigation—use a gas gauge IC (e.g., MAX17043) to track actual charge/discharge cycles and predict remaining life. Too complex for Rev A; consider for future.

  4. Temperature management: Keep enclosure internal temp < 45°C (use white coating, ventilation, shade). Every 5°C reduction in storage temp ~doubles battery lifespan.

  5. Derate usable capacity over time: In firmware, reduce the derate factor from 80% to 70% after 12 months. This accounts for degradation. Recalculate autonomy and adjust TX interval if needed.

Recommendation: Start a battery replacement log from day 1. Record battery serial number, installation date, and measured capacity at install (via datasheet). Forecast 18-month replacement cost into the project budget.


Scenario 4: Extreme Heat

Cause

Rajasthan summer temperatures exceed 50°C ambient. Direct sun can heat an unshaded enclosure to 60–65°C internal. At 60°C, Li-ion battery efficiency drops, MT3608 efficiency decreases, and quiescent draws increase.

Modelled Impact

Baseline (25°C ambient, 35°C enclosure):
  MT3608 efficiency: 85%
  LDO quiescent: 1 mA
  MT3608 quiescent: 0.6 mA
  Daily consumption: 77.6 mAh

High heat (50°C ambient, 60°C enclosure):
  MT3608 efficiency: 82% (temperature coefficient ~-0.3%/°C)
  LDO quiescent: 1.3 mA (+30% due to thermal runaway of base-emitter leakage)
  MT3608 quiescent: 0.65 mA

  Sleep current: 0.762 × 1.05 = 0.8 mA (5% increase)
  WiFi TX penalty: (5V × 160mA) / (3.7V × 0.82 eff) = 267 mA (vs 254 mA baseline)

  Daily consumption estimate: 18.2 × 1.05 + 36 + 25 (WiFi) = 84 mAh/day

  Impact: +7 mAh/day (+9% increase) ⚠️ Noticeable
  New margin: (350 / 84) = 4.2× (was 4.5×)

But there is a deeper issue: thermal runaway.

If the enclosure internal temperature exceeds 60°C, component efficiency degrades further: - LDO quiescent current increases exponentially (leakage doubles every 10°C) - MT3608 switching frequency drifts, efficiency becomes unpredictable - Battery self-discharge accelerates (lose ~1% capacity per day at 60°C)

In extreme cases (70°C+ internal), the system can enter a positive feedback loop: higher temperature → lower efficiency → more heat generated → system becomes unstable.

Observed Symptoms

Warm day (40°C ambient, 50°C enclosure): - Battery voltage at end of day is 0.05–0.1V lower than on cool day - No functional impact

Hot day (50°C ambient, 60°C enclosure): - Battery voltage at end of day is 0.2–0.3V lower - WiFi TX sometimes requires retries (supply voltage marginally drops) - Enclosure is hot to the touch

Extreme heat (>55°C ambient, >70°C enclosure): - Battery voltage drops significantly (e.g., 3.8V by evening, was 4.0V) - MT3608 shuts down to protect itself (quiescent oscillator stops, voltage collapses) - System reboots repeatedly, unable to boot if any load is present

Mitigation

  1. White enclosure coating: Apply white or reflective coating to enclosure exterior. Reduces solar absorption by ~50%. Target: keep internal temp <55°C even in 50°C sun.

  2. Passive ventilation: Drill small vent holes (Ø2–3 mm) in the enclosure sides, with a desiccant plug (silica gel) to prevent water ingress while allowing air circulation. Reduces internal temp by ~5–10°C.

  3. Thermal insulation for battery: Wrap the battery cell in thin foam or cork to decouple it from the hot enclosure walls. Protects battery from direct radiant heat.

  4. Firmware temperature monitoring: (Optional) Add DHT22 or BME280 internal temperature sensor (already in spec). Log enclosure temperature daily. If >55°C consistently, alert via WiFi and recommend shade/ventilation.

  5. Adaptive power management: If enclosure temp exceeds 55°C, firmware automatically switches from Profile B to Profile C (lower consumption, less heat generation). Trade-off: less frequent data but system survives.

  6. Component derating: During PCB layout, use wider traces on 5V and battery rails to reduce copper resistance and lower voltage drop (lower heat in conductors).

Recommendation: In the first summer deployment, place a temperature data logger inside the enclosure. Record temp every 15 minutes for 4 weeks during peak summer. This gives you a real baseline for thermal design. If max is >65°C, immediately apply white coating.


Scenario 5: MT3608 Instability

Cause

The MT3608 boost converter can oscillate or become unstable under certain load conditions. If the output capacitance is too low, or if the load draw has fast transients (WiFi TX sudden current spike), the converter may struggle to regulate voltage.

Modelled Impact

Stable case (10µF output cap, soft WiFi TX ramp):
  Output voltage: 5.0V ±0.05V (excellent regulation)
  Supply to ESP32: stable, no brown-outs

Marginal stability (small cap, hard WiFi TX transient):
  Output voltage: 5.0V → 4.7V → 5.1V (oscillating)
  Ringing frequency: ~1 MHz (audible as whine in some boards)
  ESP32 sees supply sag to 4.7V, MCP100 supervisor still has margin
  No functional failure, but stresses components

Unstable case (no output cap, or component failure):
  Output voltage: 5.0V → 3.5V (complete sag under load)
  Supply to ESP32: brown-out
  MCP100 pulls reset
  System boots, WiFi TX starts, sag happens again
  System stuck in boot-loop, cannot transmit

Observed Symptoms

Early (marginal): - WiFi TX occasionally fails (low confidence) - Serial log shows erratic behavior (if serial is available) - Oscilloscope shows ringing on BOOST_5V node

Confirmed (unstable): - WiFi TX fails consistently - System reboots every 2–3 seconds - Battery voltage is normal, but system cannot operate

Root Cause Diagnosis

On a breadboard, connect an oscilloscope to the BOOST_5V node. Trigger on WiFi TX (GPIO for WiFi radio enable). Observe voltage step response: - Good: Clean step from 5.0V down to ~4.9V, settling in <1 µs, no ringing. - Marginal: Step to 4.7V with small ringing, settles in ~10 µs. - Bad: Severe ringing (peaks and valleys), multiple transients, takes >100 µs to settle.

Mitigation

  1. Breadboard validation (BLOCKING): Before PCB, build MT3608 on breadboard with:
  2. 10µF input capacitor (ceramic)
  3. 10µF output capacitor (ceramic, close to MT3608 pin)
  4. Load: 160 mA transient from a FET switch (simulating WiFi TX)
  5. Measure response time and peak voltage sag
  6. Acceptance: peak sag < 0.2V, settling time < 10 µs

  7. Output capacitor selection: Use a high-quality ceramic capacitor (X7R or X5R dielectric, not Y5V) rated for 6.3V minimum. Mount directly at MT3608 output pins.

  8. Trace layout: Keep traces from MT3608 output to ESP32 VIN short (<2 cm). Minimize loop area (path for current return). Use wide traces (0.6 mm minimum).

  9. Feedback network: Verify MT3608 feedback divider (trimpot for 5V setting) is correctly tuned. Trimpot should be physically marked after calibration to prevent accidental adjustment.

  10. Firmware timing: Add a soft-start delay in firmware. Don't immediately enable WiFi radio on boot; wait 100 ms for boost converter to fully stabilize first.

Recommendation: MT3608 stability validation is on the critical path for PCB fabrication. Do not proceed to PCB until breadboard test passes. This is a credible failure mode that could render the system non-functional in the field.


Scenario 6: Complete Solar Failure

Cause

The solar panel is completely blocked (hail damage, physical obstruction, panel completely dusty and not cleaned), or the charger fails. System reverts to pure battery drain.

Modelled Impact

Profile B, no solar input, no charger recharge:
  Battery capacity: 2720 mAh usable
  Daily consumption: 77.6 mAh
  Autonomy: 2720 / 77.6 = 35 days

This is a design feature, not a failure. The system is designed to survive 31+ days
on pure battery. Solar is a bonus, not a requirement.

However, if solar failure is NOT detected by firmware, the operator might assume
everything is fine, and the system dies unexpectedly after 31 days.

Observed Symptoms

Day 1–7: - Battery voltage drifts down by ~0.1V per day (subtle) - Panel voltage (at charger input) is 0V (no solar input)

Day 14: - Battery voltage is 3.8V (down from nominal 4.0V) - Status LED brightness is noticeably lower

Day 28–31: - Battery voltage drops to 3.2V (approaching cutoff) - WiFi TX might fail on cloudy days (low supply voltage) - System enters critical state

Mitigation

  1. Solar fault detection: Firmware monitors charger input voltage. If <2.0V for >4 hours (indicating no solar AND no panel connected), trigger an alert via WiFi: "Solar panel disconnected or failed."

  2. Battery capacity warning: Firmware tracks days of operation. If no charger input detected and the system has been running for >20 days on battery alone, send warning: "Battery capacity low, expect failure in ~10 days."

  3. Voltage threshold alerts: If battery voltage drops below 3.6V (leaving only 10–12 days of autonomy), alert the operator.

  4. Manual visual inspection: Recommend monthly field visits. During visits, visually check:

  5. Solar panel for damage, dust, or obstruction
  6. Panel connector for corrosion or loose contacts
  7. Enclosure for damage or water ingress

  8. Firmware shutdown procedure: If battery voltage reaches 3.1V, firmware gracefully powers down (shuts off WiFi, saves any buffered data to flash, and pulls reset pin LOW). System stays in shutdown until battery is manually replaced or solar recharges above 3.3V. Prevents corruption.

Recommendation: The 31-day autonomy is a safety net, not a feature. Design the operational cadence (monitoring visits, maintenance schedule) so that solar is checked at least every 2 weeks. Rely on battery autonomy only for extended cloudy spells, not for forgotten deployments.


Combined Worst Case

What if multiple stressors hit simultaneously?

Scenario: Monsoon season (clouds reduce solar to 100 mAh/day), battery is 18 months old
          (aged 15%, usable capacity down to 2312 mAh), enclosure is 60°C internal
          (efficiency down 9%), and WiFi signal is poor (retries add 20% overhead).

Model:
  Solar: 100 mAh/day (down from 350)
  Battery capacity: 2312 mAh (down from 2720)
  Daily consumption: 77.6 × 1.09 (heat) × 1.20 (WiFi retries) = 101.5 mAh/day

  Daily deficit: 100 - 101.5 = -1.5 mAh/day (negative, battery drains)
  Autonomy: 2312 / 101.5 = 23 days ⚠️ Less than design autonomy of 31 days

In this combined worst case, the system is unsustainable even with solar input. Battery drains at ~1.5 mAh/day, and the 31-day autonomy compresses to 23 days. This is still acceptable (>3 weeks), but the margin is gone.

Mitigation for Combined Scenario

  1. Firmware adaptive control: Detect multiple stressors and respond:
  2. Low solar (<150 mAh/day) + poor WiFi + aged battery → auto-switch to Profile C
  3. This reduces consumption to ~54 mAh/day, giving positive margin again

  4. Operator response: If alerts indicate combined stress (low solar, high heat, poor WiFi, old battery), perform recovery:

  5. Clean solar panel immediately
  6. Replace battery (if >18 months old)
  7. Reposition antenna for better WiFi signal
  8. Reduce enclosure temperature (white coating, ventilation)

  9. Graceful degradation: If system predicts failure within 10 days, reduce TX frequency to hourly (not 15-min). Save 90% of TX energy. Sacrifice data freshness to maximize autonomy.

Critical: The combined worst case is survivable due to design margin, but it requires operator awareness and timely intervention. Monitor battery voltage and solar harvest daily. A simple Python script on the farm server can flag when multiple stressors are detected and send an alert.


Risk Matrix

Visual summary of all scenarios:

graph TD
    A["Low Solar<br/>Probability: MEDIUM<br/>Severity: MEDIUM<br/>Mitigations: Clean panel,<br/>switch to Profile C"]

    B["WiFi Retries<br/>Probability: MEDIUM<br/>Severity: LOW<br/>Mitigations: Site survey,<br/>antenna reposition"]

    C["Battery Aging<br/>Probability: HIGH<br/>Severity: LOW<br/>Mitigations: Replace<br/>every 18mo"]

    D["Extreme Heat<br/>Probability: HIGH<br/>Severity: MEDIUM<br/>Mitigations: White coat,<br/>ventilation"]

    E["MT3608 Instability<br/>Probability: LOW<br/>Severity: HIGH<br/>Mitigations: Breadboard<br/>validation<br/>blocking PCB"]

    F["Solar Complete Fail<br/>Probability: LOW<br/>Severity: LOW<br/>Mitigations: Alerts,<br/>manual inspection"]

    G["Combined Worst<br/>Probability: LOW<br/>Severity: MEDIUM<br/>Mitigations: Adaptive<br/>firmware, operator<br/>response plan"]

    style A fill:#FFF3E0,stroke:#E65100,stroke-width:2px
    style B fill:#FFF3E0,stroke:#E65100,stroke-width:2px
    style C fill:#FFF9C4,stroke:#F57F17,stroke-width:2px
    style D fill:#FFF3E0,stroke:#E65100,stroke-width:2px
    style E fill:#FFCDD2,stroke:#C62828,stroke-width:2px
    style F fill:#FFF9C4,stroke:#F57F17,stroke-width:2px
    style G fill:#FFF3E0,stroke:#E65100,stroke-width:2px
Scenario Probability Severity Impact on Autonomy Mitigation Priority
Low solar (dust) MEDIUM MEDIUM -5 to -10 days HIGH (easy fix)
WiFi retries MEDIUM LOW -1 day HIGH (commissioning)
Battery aging HIGH LOW -1 day/year MEDIUM (planned)
Extreme heat HIGH MEDIUM -5 days HIGH (design)
MT3608 instability LOW HIGH Fatal CRITICAL (blocking)
Solar complete fail LOW LOW N/A (design feature) MEDIUM (alerting)
Combined worst case LOW MEDIUM -8 to -10 days HIGH (monitoring)

Next Steps


Design Decisions | Next → Deployment Guide