About the Project (Detailed Scientific Documentation)

This page documents the scientific idea, the MATLAB-to-web conversion, and provides a detailed explanation of the computational services (ANFIS/XAI) with references.

Documentation

0) Scientific Idea of the Project

ANFIS + XAI

The project builds a road-maintenance cost estimation system using ANFIS (Adaptive Neuro-Fuzzy Inference System) based on a Sugeno FIS, then adds an explainability (XAI) layer to clarify “why” the model produced a given cost—not only “what” the cost is.

Outputs Cost (IQD) + Gauge class + Bar Chart + Text Explanation
Research goal Document the conversion of a MATLAB model into a usable web system

ANFIS was introduced by Jang (1993) as a framework that combines neural networks with fuzzy inference (FIS) using a hybrid learning procedure to construct an input-output mapping from data and if-then rules. Sugeno FIS is well-suited for modeling because it uses functional (often linear) consequents, supporting interpolation in the input space.

Key references
  • • Original ANFIS paper (Jang, 1993). PDF
  • • MathWorks Sugeno FIS documentation. MathWorks

1) Detailed Explanation of HTS Service

HTS

In this project, HTS is not a universal standard name; it is an engineering proxy (severity index). The idea is to aggregate the number of extremely hot days (above 40°C) over a time window representing pavement age prior to the maintenance date. Higher long-term thermal exposure increases the likelihood of asphalt distress (e.g., rutting and deformation).

Inputs and logic
  • • pavementAgeCode: pavement age code (1/2/3).
  • • maintenanceDate: maintenance date (extract maintenance year).
  • • HeatDay table: yearly stored days_above_40.
Pavement age mapping
Code 1 6 years
Code 2 12 years
Code 3 18 years
Equation matching the code

We compute startYear and then sum days_above_40 from (startYear + 1) to maintenanceYear:

startYear = maintenanceYear - yearsSpan

HTS = Σ_{y = startYear + 1 إلى maintenanceYear}  days_above_40(y)
        

This matches the Laravel logic: whereBetween(year, [startYear+1, maintenanceYear]) then sum(days_above_40).

Why 40°C? (Scientific justification)

Because research reports that when ambient temperature exceeds 40°C, pavement surface temperature can reach ~65–75°C, significantly increasing the risk of rutting, deformation, and reduced load-bearing capacity. Therefore, counting “days above 40°C” becomes a meaningful simplified proxy for long-term thermal exposure—especially for Iraq/Karbala climate.

  • • Source mentioning 65–75°C surface range when ambient >40°C and rutting risk. MDPI (2025)
  • • evalfis and FIS evaluation concept (HTS later becomes an ANFIS input). MathWorks evalfis

Research note: HTS here is a designed proxy rather than a universal standard name. The goal is to represent climate severity (heat/time) numerically for the model.

2) Detailed Scientific Explanation of AnfisService (Cost)

ANFIS

This service is the core of the project: it replicates the original MATLAB pipeline. In essence: (1) apply input transforms (Log + normalization), (2) evaluate a Sugeno FIS (evalfis concept) to obtain yNorm, (3) reverse normalization to restore the output scale, then apply inverse log to return the final cost in IQD.

2.1) Log Transform: log10(x + 1)

We use log10(x + 1) to compress large ranges, reduce skewness, and improve numerical stability before normalization and evaluation—common when inputs span wide numeric ranges.

x_log = log10(x + 1)
        

Implementation note: log10_safe protects against unexpected non-positive values to avoid runtime failures.

2.2) Normalization: mapminmax (Apply) using PS_in

After log transform, we apply the same MATLAB normalization using PS_in. In MATLAB, mapminmax can return process settings (PS) which can be reused to transform new data identically—ensuring the PHP implementation matches MATLAB exactly.

  • • PS_in.gain / xoffset / ymin / yrange are used to map each feature to the target range.
  • • You stored these values in config(anfis.ps_in.*) for reproducibility.
Reference

MathWorks mapminmax documentation explains normalization and returning PS settings for reuse. MathWorks mapminmax

2.3) Sugeno FIS Evaluation (evalfis concept)

Here we evaluate the Sugeno FIS: compute membership degrees per rule, compute rule firing strengths using an AND operator (e.g., product), compute consequent outputs (often linear), then aggregate in a Sugeno manner to produce yNorm—conceptually matching MATLAB evalfis.

FIS type Sugeno (Type-1)
Concept evalfis
2.4) Reverse Output Normalization + Inverse Log

After yNorm, we restore the original output scale using PS_out (reverse mapminmax) to get outLog. Then apply inverse log to obtain final cost:

outLog = reverse_mapminmax(yNorm, PS_out)

cost = (10 ^ outLog) - 1
        

Thus you implemented: Log → mapminmax → Sugeno eval → reverse mapminmax → inverse log, mirroring MATLAB.

2.5) Why this matters (Reproducibility)

Research value: you did not retrain the model on the web; you converted a trained MATLAB model into PHP while preserving the same transforms and parameters (fis + PS settings). This enables an accurate MATLAB vs PHP comparison.

Core reference: ANFIS

Jang (1993) describes ANFIS architecture, hybrid learning, and mapping construction from training data. Jang 1993 PDF

3) Detailed Scientific Explanation of CostGaugeService

Gauge

After obtaining the total cost, we need a fair classification that remains comparable across projects with different areas. Therefore, we use an index A = cost per square meter (IQD/m²). This is a common normalization/parametric indicator for meaningful comparison.

Core formula
A = TotalCost / Area   (IQD per m²)
        

The service first validates Area > 0, computes A, then matches it against configured ranges in config(cost_gauge.ranges).

Classification logic (Ranges)
  • • If A is within [min, max), select that range.
  • • If A is below the first range → use the first range.
  • • If A exceeds the last range → use the last range.
  • • Result returns label_key + label + range_used + full ranges for gauge rendering.
Reference (Cost per m² concept)

Cost per unit area is commonly used in cost estimation as a parametric normalization metric for comparison. RAIC CHOP (Cost Planning)

4) Detailed Explanation of SensitivityService (OAT + Perturbation)

XAI

This service produces an “impact percentage” for 7 selected inputs (out of 13). The method is One-At-a-Time (OAT): change one factor at a time and observe the output change. This is a simple, local, user-friendly sensitivity approach.

Step-by-step computation (as implemented)
1) Base cost:
   C0 = f(x)

2) For each feature i:
   - If continuous: test (xi - 10%) and (xi + 10%)
   - If categorical: test all allowed values (except current)

3) Max absolute impact:
   Δi = max | f(x_test) - C0 |

4) Impact percentage:
   Impact%i = (Δi / C0) * 100

5) Sort descending -> bar chart + top influencers
        

Finally, results are sorted by impact_percent (DESC). The top_n influencers are used to support the ExplanationService.

Why OAT in practice?
  • • Clear to end users: “only this factor changed, and the output shifted by X”.
  • • Fast and applicable without retraining.
  • • Suitable as a local (instance-based) explanation per project.
Limitations
  • • May miss feature interactions because only one factor changes at a time.
  • • Depends on the perturbation size (e.g., ±10%).

These limitations are well-known in sensitivity analysis, yet the method remains highly practical for local explainability in a web system.

Scientific references

5) Detailed Explanation of ExplanationService (Rule/Template-based NLG)

NLG

This service is the language layer over numbers: it converts computed results into a professional explanation. Scientifically, this is Data-to-Text / Natural Language Generation (NLG). Your approach is rule/template-based: it uses fixed templates plus rule-based selection of reasons from top influencers.

5.1) Inputs consumed by the service
  • • Gauge: classification + A index (cost/m²).
  • • Sensitivity: top_influencers + debug_payload (direction, delta, ...).
  • • inputsAssoc: current input values to describe the current state.
5.2) Reason selection rules
- min_influence_pct: أقل نسبة تأثير لقبول السبب
- max_reasons: الحد الأعلى لعدد الأسباب
- نمر على top_influencers:
    إذا pct >= minPct -> نولد سبب (AR + EN)
- إذا ماكو أسباب قوية -> fallback sentence
        

The service also determines direction (increase/decrease/neutral) using delta = cost_after_change - base_cost, then uses that to craft an engineering sentence explaining whether the current value is cheaper or costlier than alternatives within the tested range.

5.3) Why template-based NLG is a good choice
  • • Consistent formal tone (good for academic reporting).
  • • Auditable: each sentence maps to specific data (debug payload).
  • • Maintainable: changing templates/rules is easier than training an NLG model.
  • • Supports bilingual output easily (AR/EN).
NLG references
  • • NLG definition (data-to-text generation). IBM (NLG)
  • • Template-based NLG paper. IAENG PDF
  • • Scientific discussion: template-based NLG is not necessarily inferior. van Deemter et al. PDF

Implementation & System Conversion (MATLAB → Web Chapter)

Research Chapter
A) System Architecture
  • • UI layer: input/results/reports pages + Plotly charts.
  • • Service layer: HtsService, AnfisService, CostGaugeService, SensitivityService, ExplanationService.
  • • Data layer: Projects/Calculations + HeatDay (HTS) + configs for PS settings.
  • • Report layer: PDF generation (summary + reasons + charts).
B) Data Flow Diagram
User Inputs
   ↓
(1) HTS  ← HeatDay(year → days_above_40)
   ↓
Build ordered13 (V1..V13)
   ↓
(2) ANFIS predictCost → TotalCost
   ↓
(3) Gauge A = TotalCost/Area → label/range
   ↓
(4) Sensitivity OAT → impact% + top influencers
   ↓
(5) Explanation NLG → summary + reasons
   ↓
UI Charts + PDF Report
          
C) Training Metrics (in MATLAB)

Here you document the MATLAB training: dataset, sample count, train/validation split, membership function choices, rule count, and training error (MSE/RMSE).

Note: ANFIS training follows the hybrid learning described in Jang 1993.

D) Testing Metrics (MATLAB vs PHP)
  • • Same 13 inputs fed to MATLAB and PHP.
  • • Compare final cost after inverse log.
  • • Compute absolute/relative error: |PHP − MATLAB| and %Error.
E) HTS Validation

HTS validation can be documented by showing: (1) a maintenance date example, (2) computed startYear per code, (3) included years, (4) the sum of days_above_40, with a screenshot from HeatDay table or query output.

F) Console Outputs Screenshots

In the report, we will include screenshots of CLI commands used to test services (e.g., anfis:test-gauge) to show stable results for the same inputs.

G) Cost Gauge Logic

Explain that classification is based on A (cost/m²), not only TotalCost. Include configured ranges and justify them based on study context.

H) MATLAB vs PHP Step-by-step Comparison
  • • Log: same equation log10(x+1).
  • • PS_in: same gain/xoffset/ymin/yrange.
  • • fis: same .fis file (Sugeno rules).
  • • PS_out: same reverse normalization.
  • • Inverse log: (10^outLog) - 1.
General references for “MATLAB → Web” concepts

These references are not your specific conversion, but official examples of MATLAB web app hosting/concepts that can support the “conversion chapter” context. MathWorks Web App Server

Example Walkthrough (One End-to-End Example)

Example

This single example shows the full pipeline: HTS → ANFIS → Gauge → Sensitivity → Explanation. Numbers are used to explain the logic, while the exact ANFIS value depends on your PS settings and FIS file.

0) Example Inputs

Assume a road-maintenance project with a specific maintenance date and a reasonable area, using a “medium” pavement age code. The goal is to show how values flow through the system.

Maintenance Date 2024-06-15
Pavement Age Code 2 (Medium)
Maintenance Area (m²) 12,000
Gov/Road (label) Karbala — Main Road (example)
1) HTS Calculation (Numeric Example)
HTS

Since pavement age code = 2, the window = 12 years. Maintenance year = 2024.

yearsSpan = 12
maintenanceYear = 2024
startYear = 2024 - 12 = 2012

HTS = sum(days_above_40) from (2013 .. 2024)
    

Assume the total days above 40°C from 2013 to 2024 in HeatDay equals: 980 days.

HTS 980

This value becomes one of the ANFIS inputs (within ordered13) based on the ordering.

2) ANFIS Cost (Illustrative Pipeline Example)
ANFIS

We feed 13 ordered inputs ordered13 (V1..V13). We show a simplified example for three values to illustrate transformations, then explain how the final cost is obtained.

Example (3 out of 13 inputs) to illustrate transforms
Assume three raw inputs (subset):
V3 (Area) = 12000
V7 (HTS)  = 980
V9 (Traffic) = 2   (categorical code as numeric)

Step A) log10(x+1):
log10(12000+1) ≈ 4.079
log10(980+1)   ≈ 2.992
log10(2+1)     ≈ 0.477

Step B) mapminmax apply (0..1):
xNorm = applyVector(xLog, PS_in)   // exact values depend on PS_in in your config

Step C) Sugeno eval:
yNorm = evaluateSugeno(xNorm)      // depends on FIS (.fis rules & parameters)

Step D) reverse output normalization:
outLog = reverseScalar(yNorm, PS_out)

Step E) inverse log:
cost = (10^outLog) - 1
      

Exact xNorm/yNorm/outLog depend on your PS_in/PS_out and FIS file, so this is a methodological illustration rather than your exact final value.

To complete the end-to-end example, assume ANFIS produced the following total cost: 540,000,000 IQD.

Total Cost (assumed for example) 540,000,000
3) Gauge: Compute A and classify (Numeric Example)
Gauge

We compute A = TotalCost / Area to compare projects fairly.

A = 540,000,000 / 12,000
  = 45,000  IQD/m²
    
A (IQD/m²) 45,000

Then the service matches A against config(cost_gauge.ranges) to determine the label (low/medium/high...) and range color.

4) Sensitivity: Impact % for 7 inputs (Simplified Example)
XAI

We show two micro-examples within the same pipeline: a continuous feature (±10%) and a categorical feature (test values). We take the maximum change and convert it to a percentage.

Example (Continuous): Area ±10%

BaseCost = 540,000,000. Test Area=10,800 and Area=13,200 and measure changes.

Base cost C0 = 540,000,000

Test low  (Area -10%): newCost = 575,000,000   → diff = 35,000,000
Test high (Area +10%): newCost = 510,000,000   → diff = 30,000,000

Δ = max(diff) = 35,000,000
Impact% = (Δ / C0) * 100
        = (35,000,000 / 540,000,000) * 100
        ≈ 6.48%
        
Impact% 6.48%
Example (Categorical): Road Type

Assume current RoadType = 2. Test values [1,3] and pick the maximum difference.

Base cost C0 = 540,000,000
Current RoadType = 2

Test RoadType = 1 → newCost = 520,000,000 → diff = 20,000,000
Test RoadType = 3 → newCost = 610,000,000 → diff = 70,000,000

Δ = 70,000,000
Impact% = (70,000,000 / 540,000,000) * 100
        ≈ 12.96%
        
Impact% 12.96%

These values are illustrative. In the real system, newCost is obtained by calling predictCost after modifying the feature.

5) Explanation: How sentences are generated (One text example)
NLG

Assume the Gauge classified the case as “Medium” and the top two influencers are RoadType (≈12.96%) and Area (≈6.48%). Below shows how the system generates a Summary + Reasons using templates and direction logic.

Summary (AR)

Summary (EN)

The estimated cost falls in the (Medium) level. The A index is ≈ 45,000 IQD/m², with a total estimated cost of ≈ 540,000,000 IQD. Overall, this classification aligns with how road characteristics, loading, and environment influence intervention depth.

Reasons (short example)
• (Road Type) — approx. impact 12.96%.
  Changing this factor increased the cost compared to the current state.
  (Δ ≈ 70,000,000 IQD).

• (Area) — approx. impact 6.48%.
  Changing this factor affected the cost within the tested ±10% range.
  (Δ ≈ 35,000,000 IQD).
      

Note: Sentences are built from (1) feature labels from config, (2) impact% from Sensitivity, (3) direction from debug_payload, then a fixed template.