Signal Sweep
Technical Signal Discovery via Peer Group Consensus
A four-phase pipeline that scores 61 sweepable technical indicators against a curated peer group of tickers, filters by cross-ticker consistency and expected value, and stress-tests the survivors against multiple-testing correction, a random-date null, and walk-forward out-of-sample validation.
Overview
The standard way a retail trader tests a technical indicator is to point a backtester at a single ticker and tune the parameters until the equity curve looks attractive. That procedure reliably generates a number that did not exist before the procedure ran. With 61 sweepable indicators, four horizons, and two directions, a single ticker offers 488 independent fits, which means the best one will show a positive Sharpe ratio by construction. The Signal Sweep is built to refuse that question and ask a different one: whether an indicator-direction-horizon combination produces consistent, positive, beta-adjusted expected value across a curated peer group of related companies without any per-ticker optimization. A signal that survives that filter is not fitted to one company's idiosyncrasies. It reflects something about how the sector as a whole responds to price and volume extremes. The pipeline lives in `indicator_sweep.py` at the repository root and runs in four phases: per-ticker sweep, event storage with enrichment, cross-ticker aggregation with expected value, and Deflated Sharpe Ratio multiple-testing correction. Walk-forward out-of-sample validation and stratified regime analysis run as separate entry points against the stored run. Each phase is documented below with its specific choices and the reasoning behind them.
Phase 1 โ Per-Ticker Sweep
Phase 1 runs every sweepable indicator against every ticker in the peer group independently. The 80-indicator library spans six categories โ 27 trend indicators (SMA, EMA, DEMA, TEMA, WMA, HMA, KAMA, VWMA, Linear Regression, VWAP, MACD, ADX, Aroon, Ichimoku, Parabolic SAR, Supertrend, and several others), 18 momentum indicators (RSI, Stochastic, Stochastic RSI, CCI, Williams %R, ROC, TSI, Ultimate Oscillator, Awesome Oscillator, Coppock Curve, KST, PPO, and others), 13 volatility indicators (ATR, Bollinger Bands, BB Width, BB %B, Keltner Channels, Donchian Channels, Historical Volatility, Ulcer Index, NATR), 14 volume indicators (OBV, VWAP, A/D Line, CMF, Chaikin Oscillator, MFI, Volume Ratio, VPT, Ease of Movement, Force Index, Klinger Oscillator, NVI, PVI), 5 custom ThinkOrSwim-derived indicators (TTM Squeeze, TTM Trend, Pivot Points, VWAP Bands, Market Forecast), and 4 composite signals (Mean Reversion Score, Trend Strength Score, Regime Filter, Volume Breakout). Of the 80, 61 are sweepable. The 19 excluded indicators are either magnitude-only outputs that carry no directional information on their own (ATR, BB Width, ADX) or component lines that only produce a signal when paired (plus_di and minus_di, aroon_up and aroon_down). Including them would inflate the indicator count without adding independent information and would weaken the Deflated Sharpe Ratio correction in Phase 4. For each indicator, the pipeline extracts signal transitions โ specifically, the bar on which the indicator changes state from neutral to buy or from neutral to sell. This distinction is load-bearing. An RSI reading of 28 on Monday and 27 on Tuesday and 31 on Wednesday involves two different signals: a buy transition on Monday (entering the oversold zone) and a sell transition on Wednesday (exiting it). If the pipeline counted every bar below 30 as a separate buy, consecutive oversold readings would generate hundreds of overlapping trades from a single episode, and the resulting hit-rate statistics would be dominated by the autocorrelation of the underlying RSI series rather than by the predictive content of the signal. Transition-only accounting produces one event per episode and forward returns that are reasonably independent of one another. Every indicator's signal-class mapping โ oscillator, crossover, zero_cross, trend_ma, band, categorical, cumulative, ichimoku, squeeze โ is documented in `signal_classifier.py`. Forward returns are computed at four horizons โ 5, 10, 20, and 60 trading days โ as (Close[t+H] / Close[t]) minus 1. The 60-day horizon is the primary operating horizon for value-investing position timing: it corresponds to roughly a calendar quarter, a natural accumulation window ahead of an expected earnings print or regulatory event. Shorter horizons are reported for reference and sometimes surface faster-acting signals, but the DSR correction treats each horizon as a separate tested strategy, and an investor holding a 12 to 18 month thesis position is not rebalancing on a 5-day basis.
Phase 2 โ Event Storage, Schwab Fundamentals, Beta-Adjusted Returns
Phase 2 stores the per-ticker hit rates and every individual signal event in two SQLite tables โ `sweep_results` (one row per ticker, indicator, direction, horizon) and `sweep_events` (one row per signal event with its raw and beta-adjusted forward returns). The two tables are keyed by a timestamp-based `run_id` that preserves every historical sweep run and lets downstream analysis (stratified EV, walk-forward OOS, Alphalens, heatmaps) run against a specific frozen snapshot. Between the per-ticker sweep and storage, the pipeline fetches twenty fundamental fields per ticker from the Schwab Market Data API: price-to-earnings ratio, PEG ratio, price-to-book ratio, price-to-cash-flow ratio, trailing twelve-month EPS change, TTM revenue change, gross margin, operating margin, net margin, return on equity, current ratio, interest coverage, debt-to-capital, beta, market cap (full and float), shares outstanding, short interest as a percent of float, short days to cover, and dividend yield. These values are stored in `sweep_fundamentals` alongside the events and serve two purposes: beta is required for the market-model adjustment described below, and the remaining fields act as conditioning variables for the stratified regime analysis in a later stage. Schwab fundamentals are a soft dependency โ if the API is unavailable or a token has expired, the sweep continues without enrichment and beta-adjusted returns are recorded as null rather than estimated from a substitute. Beta adjustment uses the market model: AR = R_stock โ beta ร R_SPY, where R_SPY is SPY's total return over the same forward horizon starting on the same signal date. The market model is the standard event-study adjustment (Brown and Warner, 1985; Kothari and Warner, 2007). The reason it matters for a position-timing use case is straightforward: a buy signal that fires the day before a broad-market rally will look successful on raw returns even if the stock underperformed SPY. For an investor who already owns the market through an index position and is using the sweep to time an active entry in a specific ticker, what matters is whether the signal predicts alpha โ return in excess of what market exposure explains โ not total return. Beta-adjusted EV is the primary reporting metric in Phase 3; raw EV is reported alongside for context, not as the headline.
Phase 3 โ Cross-Ticker Aggregation and Expected Value
After Phase 1 completes for every ticker and Phase 2 has stored and enriched the events, Phase 3 pivots the data across tickers. For each (indicator, direction, horizon) combination, the pipeline counts how many tickers exceeded the hit-rate threshold of 55% and applies a consistency filter: a combination becomes a survivor only if at least three tickers cleared the 55% threshold simultaneously. Both conditions are structural defaults and can be adjusted via CLI flags, but the defaults reflect a specific argument. A 55% hit rate is not a high bar in absolute terms โ coin flips produce 50% โ but it is high enough that a genuinely random process will not produce it consistently across several independent time series. Three-ticker agreement means the signal is not a property of the single outlier in the group that happened to trend cleanly through the window. Expected value per trade is computed pooled across all signal-generating tickers for the combination: EV = win_rate ร avg_win โ loss_rate ร avg_loss, where win and loss are defined by the signal direction (positive return for a buy, negative return for a sell). The pipeline also reports the payoff ratio (average win divided by average loss) and the Kelly fraction โ the theoretically optimal capital share if the historical win rate and payoff ratio were stationary. The Kelly number is reported for reference only; nothing about a historical peer-group sweep justifies using it for live position sizing, and the reader should treat it as a descriptive statistic of the return distribution rather than a sizing prescription. Within each (direction, horizon) group, survivors are ranked by EV per trade and assigned an `ev_rank` that the reporting layers use for ordering. The EV is computed twice โ once on raw forward returns, once on beta-adjusted forward returns โ and both columns are stored in `sweep_summary`. Where the two differ materially, the beta-adjusted figure is the economically meaningful one. A signal whose raw EV is large but whose beta-adjusted EV collapses toward zero is not producing alpha; it is producing beta exposure that the investor could have obtained more cheaply by buying SPY directly.
Phase 3b โ Random-Date Baseline
Any claim that an indicator has edge is implicitly a claim against a null. The wrong null is 50% hit rate: equity markets drift upward over long horizons, and the unconditional probability of a positive 60-day return on most US equities since 1990 exceeds 50% โ buying on a random Tuesday and checking the price sixty trading days later will win more often than not simply because the market went up in the intervening quarter. The correct null is the return an investor would have obtained by trading on dates with no signal content at all. Phase 3b constructs that null empirically. For each ticker in the peer group, the pipeline draws 200 dates uniformly at random from the full available price history โ not from signal events, not from any filtered subset โ and computes forward returns at all four horizons. The pooled random sample produces a hit rate, an expected value per trade, a payoff ratio, and a Kelly fraction that the sweep report puts on the same row as the survivor indicators under the label `_random_baseline`. The random number generator is seeded so the baseline is reproducible across reruns of the same peer group. A surviving signal must beat the random baseline on beta-adjusted EV, not merely on hit rate, and the report makes the comparison explicit. A signal that clears 55% but whose EV sits on top of the random baseline is not a finding โ it is a measurement of the market's drift.
Phase 4 โ Deflated Sharpe Ratio
Running 61 indicators across four horizons and two directions produces up to 488 tested strategies per peer group. With that many tests, some will show positive EV and above-threshold hit rates purely by chance. The hit-rate consistency filter and the random-date baseline reduce the false-positive rate but do not eliminate it, because both filters operate on unconditional statistics rather than on the distribution of the best-of-N Sharpe ratio under the null hypothesis of no edge. Phase 4 applies the Deflated Sharpe Ratio (Bailey and Lopez de Prado, 2014). The DSR answers a specific question: given that the analyst tested N strategies and reports the one with the best Sharpe ratio, what is the probability that the observed Sharpe reflects genuine edge rather than the best draw from N independent coin flips? The key analytic insight in the original paper is that the expected maximum Sharpe ratio across N independent tests scales predictably with N โ a function of the Euler-Mascheroni constant and the cross-sectional standard deviation of Sharpe ratios across the tested set. The implementation in `trading_lab/position_math.py` applies the Bailey and Lopez de Prado correction to the observed signal-level Sharpe ratio, adjusts the variance for sample skewness and excess kurtosis (signal-event return distributions are typically right-skewed and fat-tailed, and the Gaussian Sharpe-variance formula understates both), and reports a DSR p-value. A DSR p-value above 0.95 passes the default significance threshold, and those combinations are flagged as `dsr_pass` in `sweep_summary`. The Harvey-Liu-Zhu t-statistic is reported alongside as a secondary sanity check. Both columns are null when scipy is not available (DSR is a soft dependency that requires the scipy.stats module). The reason a Phase 4 filter matters is that Phase 3 is, by construction, a selection procedure. The consistency filter picks survivors out of a competitive set. Without a formal multiple-testing correction, the surviving EV statistics are conditional on having been selected, and that conditioning inflates their apparent significance. DSR undoes the inflation by pricing the selection explicitly.
Walk-Forward Out-of-Sample Validation
Hit rate, EV, and DSR are all computed against the full historical window. Even after DSR correction, a signal that passes on the full history carries look-ahead risk: the pattern that makes RSI mean reversion work in managed-care names might be specific to the regulatory environment of a particular window and would have failed in the preceding decade. Walk-forward validation tests this directly by splitting the history into expanding in-sample and out-of-sample folds and measuring whether in-sample survivors continue to show positive EV out of sample. The validation harness runs a configurable number of folds, with five as the default. In each fold, the in-sample window runs from the start of available history through the fold's cutoff date, the out-of-sample window runs from that cutoff through the start of the next fold, and the pipeline records which indicators were survivors in-sample and what their EV was out-of-sample. A signal "survives" the walk-forward test only if it appears as an in-sample survivor in at least four of the five folds and maintains positive out-of-sample EV in at least three of those folds. This is a deliberately demanding standard. It excludes indicators whose edge is concentrated in one regime and rewards indicators whose edge is detectable across multiple train-test splits. A signal that passes in-sample in every fold but reliably fails out-of-sample is the textbook definition of a curve fit. The results are stored in `sweep_oos_results`.
Stratified Regime Analysis
A signal that produces positive EV on average may not work uniformly across market conditions. Stratified EV analysis splits the signal event database by one of the stored Schwab fundamental fields โ typically PE ratio, beta, short interest as a percent of float, or ROE โ into buckets defined by percentile, fixed threshold, or median split, and computes EV separately within each bucket. The question it answers is not whether a signal works but whether it works better under specific fundamental regimes. RSI mean reversion may produce a modest unconditional EV on managed-care stocks while producing a substantially larger EV only when the stock is trading at a PE below the peer-group median โ a conditional finding that is more actionable than the headline average because it combines a timing signal with a valuation filter in one observation. Stratification runs as post-hoc analysis against a stored `run_id`. It does not modify the main survivor classifications in `sweep_summary`. Its output is a separate report that can be generated for any indicator, direction, and horizon. The practical implication for position sizing is that a signal whose conditional EV is concentrated in a single fundamental bucket should only be acted on when the current fundamental state places the stock in that bucket.
Data Sources and Storage
Price history is fetched from the Schwab Market Data API (up to roughly twenty years of daily OHLCV, cached in the `prices` table of `market.db` after the first fetch), with a yfinance fallback for tickers Schwab cannot serve. SPY price history โ required for the beta-adjusted return computation โ is fetched through the same loader and cached identically. Schwab fundamentals are fetched once per sweep run at a rate of one ticker per half-second to respect the Schwab 120-call-per-minute limit, and stored in `sweep_fundamentals` with the run's timestamp. Every sweep output โ per-ticker results, individual events with both raw and beta-adjusted forward returns, cross-ticker summary, fundamentals snapshot, Alphalens IC and quantile metrics when run, and walk-forward OOS results โ is stored in SQLite under a shared `run_id`. The `run_id` is generated from the sweep start timestamp and is stable across the full pipeline, which means any downstream analysis โ the Google Sheets export, the heatmap renderer, the strategy converter, the Alphalens tear sheet, or a subsequent walk-forward rerun โ can be pinned to a specific historical sweep without recomputing it.
Limitations
Technical indicators are transformations of price and volume series. Every pattern the sweep detects is a pattern that held historically. Walk-forward validation, DSR correction, and the random-date baseline each reduce the probability that a reported survivor is a curve fit, but none of them eliminates the possibility that a pattern reflects a regime that will not recur. The survivors reported by the sweep are the set of combinations that cleared every published filter on the data available at the time of the run, not a set of signals with guaranteed forward performance. Beta adjustment uses the Schwab-reported trailing beta at the time of the sweep run, not a period-specific beta estimated over the window that contains the signal event. For tickers whose market sensitivity shifted materially during the backtest window โ managed-care names moved through at least two distinct beta regimes across the Affordable Care Act implementation period and its aftermath โ the adjustment is imprecise for the affected sub-window. The distortion is likely small relative to the cross-sectional differences the sweep is designed to detect, but it is not explicitly modeled and the reader should treat beta-adjusted EV as a first-order correction rather than a complete factor model. Peer-group size sets a ceiling on statistical confidence. The default consistency threshold of three tickers represents different fractions of different peer groups โ three of nine in managed-care, three of five in athletic retail โ and the confidence interval on the cross-ticker EV estimate tightens as the peer group grows. Peer selection is itself a methodological choice and the earlier feedback on this pipeline was explicit: peer composition is the single most consequential variable in the sweep, not the indicator library. The honest approach is to run several plausible peer groupings and look for signals that appear across configurations rather than in any one. Horizons are tested independently against the same pool of signal events, which creates mechanical correlation between the four return columns. A buy signal on a given date generates four observations โ 5-day, 10-day, 20-day, and 60-day returns โ that are not independent of one another. The DSR correction treats each horizon as a separate strategy, which is conservative with respect to false positives but does not exactly model the within-event correlation structure. In practice the 60-day horizon is the reporting primary and the shorter horizons are reported for diagnostic context. Transaction costs are not included in the EV calculation. The pipeline is designed for position-timing โ entering or accumulating a thesis position that the investor intends to hold for 12 to 18 months โ not for a rotating portfolio of 20 to 60-day swing trades. At the target use case, commission friction affects a single entry leg and is small relative to the EV magnitudes reported by surviving signals. An investor who used the sweep output to actually rotate in and out of positions on the 20-day horizon would encounter round-trip friction that the reported EV does not model. Sector specificity is the point of the pipeline, not a limitation, but it is a limitation on the transferability of the output. Signals that survive in managed-care may not survive in cable-broadband or athletic-retail, and survivors in one peer group should not be treated as validated in another without rerunning the pipeline against the second group. The peer group is the unit of analysis, and the sweep output carries no claim about universal technical-indicator behavior.