Dwarves
Memo
Type ESC to close search bar

Reconstructing historical trading PnL: a data pipeline approach

Executive summary

Recovering historical trading profit and loss (PnL) data is a critical challenge for finance and cryptocurrency platforms. When historical records are unavailable, users cannot validate past trading strategies, assess long-term performance, or reconcile discrepancies. This blog details how I tackled this problem by transforming a technically daunting challenge into a robust, maintainable data pipeline solution.


Background and context

What is trading PnL?

In trading, Profit and loss (PnL) represents financial outcomes:

For instance, when you close a Bitcoin position at a higher price than you entered, your realized PnL reflects the profit after fees. If the position is still open, unrealized PnL tracks potential outcomes as prices fluctuate.

Why does historical PnL matter?

Historical PnL data provides traders with:

  1. Performance insights: Understanding which strategies worked and which didn’t.
  2. Compliance and reporting: Regulatory or internal needs often require accurate historical data.
  3. Strategy validation: Testing new algorithms against past market conditions relies on accurate PnL records.

The problem at Hand

While designing a trading PnL chart for my platform, a significant gap emerged: historical PnL data for certain periods was missing. The existing system calculated PnL in real-time but didn’t store intermediary data, making reconstruction impossible without extensive changes to the codebase.


The challenge

“How can we reconstruct historical trading PnL data efficiently when the original records no longer exist?”

This question encapsulated two core issues:

  1. Data loss: Real-time calculations discarded intermediary steps, leaving gaps in historical records.
  2. System complexity: The platform’s codebase, written in Elixir, was intricate, tightly coupled, and unfamiliar to me.

Moreover, the system’s reliance on multiple data sources (trades, market prices, fees) and the sheer volume of transactions compounded the problem.


Technical requirements

To reconstruct PnL, the following were essential:


System analysis

From complex code to data flows

Instead of delving into intricate application logic, I reimagined the system as a series of data flows, where data is ingested, transformed, and stored across multiple layers. Below is the existing flow:

flowchart LR
    subgraph Sources
        B[Binance]
        R[REST API]
    end

    subgraph DataLake
        ETS[Elixir ETS]
    end

    subgraph Processing
        MV[Materialized Views]
        DB[(PostgreSQL)]
    end

    subgraph Output
        A[Analytics]
        Rep[Reports]
    end

    B --> ETS
    R --> ETS
    ETS --> DB
    DB --> MV
    MV --> A
    MV --> Rep

Reconstructing the flow

From the above flow of data, we can easily determine which parts of the flow we should reproduce to find the old PnLs.

One more important thing is the formula to calculate PnL when transforming data in the 3rd step.

There are many things that must be reproduced. But we will not implement all of them completely. Some useful data stored in the Postgresql DB can be reused. Let’s check!

Comparing to the available data to the above formulas, we can see that everything is enough to calculate the unrealized PnL without Binance. But Binance is needed in retrieving the old marking prices to calculate historically unrealized. So we can illustrate the new flow as follows.

flowchart LR
    subgraph Input
        B[Binance API]
        DB[(Trading Info DB)]
    end

    subgraph Processing
        P[Data Mapping]
        C[Parallel Processing]
    end

    subgraph Storage
        PNL[(PnL Storage)]
    end

    B -->|Price Data| P
    DB -->|Trading History| P
    P --> C
    C -->|Results| PNL

Implementation

The reconstruction process involves five major steps:

  1. Data collection
    Fetch necessary data from two sources:

    • Database: Historical trading data, fees, commissions.
    • Binance API: Historical Kline data for price points.
  2. Mapping data
    Group the data by trading pairs (tokens) for efficient processing.

  3. Token-level calculation
    For each token:

    • Use minute-level Kline data to calculate fees, realized PnL, unrealized PnL, and entry prices.
    • Apply cumulative calculations to ensure accuracy.
  4. Aggregate results
    Sum PnL across all positions for the user’s account.

  5. Storage and visualization
    Save results back into the database and visualize them in the PnL chart.

flowchart TD
    subgraph DataCollection
        F[Fees Data]
        T[Trade Data]
        K[Kline Data]
    end

    subgraph Processing
        M[Map by Token]
        C[Calculate per Token]
        A[Aggregate Results]
    end

    F --> M
    T --> M
    K --> M
    M --> C
    C --> A

Outstanding challenges

Volume of data
Minute-level Kline data is essential for accuracy, but retrieving and processing it is resource-intensive:

PnL accuracy
PnL, specifically realized PnL, is stuck to the trade set to help us know the total PnL of this trade set by accumulating the closed trade PnL and fee over time. So if we retrieve the list of user trades randomly, it may produce the wrong PnL and let our report make nonsense.


Optimization strategies

flowchart TD
    TS[Find Trade Sets]
    P[Process Each Set]
    C[Calculate PnL]
    
    TS --> P
    P --> C

Quality assurance

To validate the reconstruction process:


Conclusion

This case study highlights the power of a data-centric approach in solving financial system problems. By treating the challenge as a structured data pipeline problem, we avoided risky codebase modifications and developed a robust, scalable solution. Techniques such as parallel processing, time-series reconstruction, and efficient data retrieval were key to solving the problem within system constraints.

This approach demonstrates that data-driven solutions can effectively address complex challenges while maintaining flexibility and performance for future needs.