Report 9: RE-Protocol-Agent: Project Planning & Architecture

1. Project Overview
2. Research Question
3. Background
4. Tool Architecture
5. Planned Workflow
6. Static Analysis Plan
7. Protocol Analysis Plan
8. Dynamic Analysis Plan
9. Ethical Boundaries
10. Expected Challenges
11. Lab Setup
12. Initial Hypothesis

1. Project Overview {#overview}

My final project is RE-Protocol-Agent, an AI-assisted reverse engineering and protocol analysis tool for Android mobile applications. The research target for this project is the Mercedes-Benz / Mercedes me mobile application, specifically the parts related to connected vehicle behavior.

The broader domain is mobile application reverse engineering and protocol analysis. Connected vehicle apps are a good target for this kind of study because they combine user accounts, authentication, mobile app logic, backend APIs, vehicle identity, VIN handling, region-specific behavior, and vehicle-linking workflows. These systems are interesting from a reverse engineering perspective because the mobile app often gives clues about how the larger system is structured.

The goal of this project is not to bypass the app, attack a backend, brute force credentials, or exploit any service. The goal is to build a local academic analysis framework that can inspect user-provided APK artifacts, process local traffic captures, correlate evidence, and produce a clean technical report. This project was also motivated by my own experience with regional VIN restrictions in the app. I tried to connect my personal vehicle using the German version of the app, while the vehicle was linked to the Japanese market. When using the Japanese version, the connection process required support from an authorized dealer in Japan because of country-specific laws and policies. This made me interested in understanding how the system works technically, using only local APK artifacts, traffic captures, and evidence collected in a controlled academic environment.

RE-Protocol-Agent CLI main menu

2. Research Question {#research-question}

The main research question is:

Can an AI-assisted local agent automate the reverse engineering workflow for a connected vehicle Android app and produce a grounded report about static mobile behavior and protocol-level observations?

More specifically, the areas of interest are:

Areas of investigation:

  VIN and vehicle terms        →  where they appear in the app
  Authentication flows         →  strings, endpoints, OAuth patterns
  Garage / ownership logic     →  registration and linking behavior
  Region and market logic      →  locale, country, feature flags
  Backend endpoint candidates  →  recoverable API paths and hosts
  Evidence quality             →  confirmed vs inferred vs unknown

A successful result means the tool can take an APK and optional traffic captures, run the analysis pipeline, and generate a Markdown report with clear evidence. A successful result does not require finding a vulnerability, it requires producing a reliable and honest analysis.

3. Background {#background}

Android applications are distributed as APK files or app bundles. An APK can contain compiled DEX bytecode, resources, native libraries, metadata, and an AndroidManifest.xml file. The manifest is especially important because it defines the package name, permissions, activities, services, receivers, providers, exported components, deep links, and other app-level configuration.

Key Android reversing concepts for this project:

APK structure              →  DEX, resources, native libs, manifest
Package identity           →  package name, signing, version
Android permissions        →  declared capabilities
Activities and components  →  launchable entry points
Exported components        →  accessible from outside the app
Intent filters             →  deep links, URI schemes
String extraction          →  visible constants in code and resources
Endpoint extraction        →  hardcoded URLs, domains, API paths
Static vs dynamic          →  file analysis vs running the app

The protocol analysis side focuses on passive traffic artifacts: HAR files, PCAP files, or mitmproxy JSON exports. These can reveal hosts, paths, methods, status codes, DNS queries, TLS SNI values, IP addresses, ports, and request/response structure. However, the tool does not decrypt TLS, bypass certificate pinning, or interfere with authentication.

4. Tool Architecture {#architecture}

RE-Protocol-Agent is designed as a modular pipeline. A central Controller coordinates each module and writes outputs into a case folder. The AI layer is added after deterministic analysis, it does not invent findings. It only summarizes locally generated artifacts.

Controller
  │
  ├── Intake Module          →  metadata.json
  ├── Decompiler Module      →  extracted source files
  ├── Manifest Analyzer      →  manifest_summary.json
  ├── Code Search            →  static_findings.json
  ├── String Extractor       →  strings_interesting.json
  ├── Endpoint Extractor     →  endpoints.json
  ├── Protocol Parsers       →  protocol_summary.json / pcap_summary.json
  ├── Correlator             →  correlated_findings.json
  ├── AI Reasoner            →  ai_summary.md
  └── Reporter               →  final_report.md

The key design principle is evidence first, AI second. The deterministic modules generate structured artifacts. The AI reasoner reads those artifacts and produces a summary. If evidence is missing, it reports that it is missing.

5. Planned Workflow {#workflow}

The workflow begins with APK intake and ends with a report:

APK Artifact
  → Hash and Metadata
  → Manifest Analysis
  → Decompilation / Existing Extracted Folder
  → Code and Resource Search
  → String Extraction
  → Endpoint Extraction
  → Optional Protocol Capture Parsing
  → Correlation
  → AI Summary
  → Markdown Report

The tool creates a structured case directory so the analysis can be reproduced:

outputs/<case_name>/
├── metadata.json
├── manifest_summary.json
├── static_findings.json
├── strings_interesting.txt
├── strings_interesting.json
├── endpoints.json
├── protocol_summary.json
├── pcap_summary.json
├── correlated_findings.json
├── ai_summary.md
├── dynamic_summary.md
└── final_report.md

Output folder showing generated analysis artifacts

6. Static Analysis Plan {#static-plan}

The static analysis stage searches the APK and decompiled files for evidence related to the research target. The keyword categories are designed around connected vehicle behavior:

Category               Keywords
──────────────────────────────────────────────────────────
Vehicle identity       VIN, vehicle, garage, registration,
                       pairing, activation, ownership
Authentication         OAuth, token, login, auth, bearer,
                       credential, session
Region logic           region, country, market, locale,
                       timezone, territory
Backend / API          endpoint, API, backend, host,
                       telematics, feature flag

Decompiled App Files
  ├── Keyword Search
  │     ├── Vehicle and VIN findings
  │     ├── Authentication findings
  │     └── Region and market findings
  ├── String Extraction       →  interesting strings
  └── Endpoint Extraction     →  endpoint candidates
        ↓
  Static Findings Report

The tool redacts obvious sensitive values such as bearer tokens, API keys, JWT-looking values, cookies, and long random secrets, the goal is analysis and reporting, not credential collection.

7. Protocol Analysis Plan {#protocol-plan}

The protocol side is intentionally passive. The tool parses files the analyst provides or files generated during an authorized local capture. It does not attack services or generate unauthorized requests.

Local Traffic Artifact
  │
  ├── HAR Parser      →  HTTP methods, hosts, paths, status codes
  ├── PCAP Parser     →  DNS, TLS SNI, IPs, ports
  └── mitmproxy JSON  →  request and response metadata
        ↓
  Protocol Summary
        ↓
  Static + Protocol Correlation

For PCAP parsing, the tool uses tshark if installed. It extracts visible network metadata but does not decrypt traffic.

8. Dynamic Analysis Plan {#dynamic-plan}

The dynamic workflow makes protocol analysis more realistic. The agent handles the environment: starting the emulator, launching the app, starting capture, collecting logcat. The user performs sensitive in-app actions manually.

Sequence:
  User       →  Start guided capture
  CLI        →  Begin dynamic workflow
  Agent      →  Start or connect to Android emulator
  Agent      →  Install or launch target app
  Agent      →  Start local PCAP and logcat capture
  User       →  Manual login in app
  User       →  Manual VIN or vehicle-flow check
  Capture    →  Save runtime artifacts
  Agent      →  Parse, correlate, and summarize
  Reporter   →  Updated final_report.md

Emulator open with target app ready for guided testing

The dynamic stage is one of the hardest parts of the project because Android apps often depend on emulator compatibility, Google Play Services, package identity, app bundles, region settings, and device environment details.

9. Ethical Boundaries {#ethics}

The project has strict boundaries. It is for academic, legal, local analysis only.

Allowed                          Not Allowed
──────────────────────────────   ─────────────────────────────
Analyze user-provided APKs       Credential attacks
Parse local HAR/PCAP/mitmproxy   Brute forcing
Collect local emulator logs      Authentication bypass
Extract strings and endpoints    Certificate pinning bypass
Produce grounded reports         App patching or repacking
                                 Production fuzzing
                                 Unauthorized API interaction
                                 Automated VIN submission
                                 Automated login

10. Expected Challenges {#challenges}

Challenge                           Why it matters
──────────────────────────────────────────────────────────────────
Emulator launch reliability         App may need Play Services or
                                    specific device profile
APK vs app bundle support           Split APKs require different
                                    install approach
Separating confirmed vs inferred    Evidence quality directly
evidence                            affects report credibility
Encrypted traffic                   TLS blocks passive PCAP reading
Accidental sensitive value exposure Redaction must work correctly
Grounded AI summaries               AI must not speculate beyond
                                    available evidence

The biggest expected difficulty is dynamic protocol analysis. Static analysis can produce useful evidence, but protocol reversing requires the app to run correctly and generate authorized traffic that can be captured.

11. Lab Setup {#lab-setup}

Component              Tool / Version
──────────────────────────────────────
Language               Python 3.11
RE Agent               RE-Protocol-Agent CLI
Decompiler             jadx
Packaging tool         apktool
APK inspection         aapt
Android tooling        adb
Android emulator       Android Emulator (AVD)
Traffic capture        tshark / Wireshark
Case storage           Local output folders
AI key                 Environment variable

The first version prioritizes a reliable CLI and Markdown report generation over a graphical interface. The goal is to make the tool usable by an average technical user while keeping the analysis reproducible.

12. Initial Hypothesis {#hypothesis}

My hypothesis is that static analysis will reveal useful Mercedes me app concepts: vehicle, VIN, authentication, region, and endpoint references, because these concepts appear in most connected vehicle apps at the code and resource level.

I also expect that dynamic protocol analysis will require more environment work because the app needs to run correctly in an emulator before traffic can be captured and correlated.

Part I therefore frames the project as both a reverse engineering study and a tool-building project. The main objective is to create a safe agentic workflow that can collect evidence, organize it, and explain it clearly.