← All Posts

Vendor-Recommended Doesn't Mean Safe to Apply: A WAF Security Story

by dsamist·May 16, 2026

This post is a story about a certain time with an organization I once worked with whose infrastrcuture was hosted on AWS. On a certain day, our team got an email from the AWS Security Incident Response team. The subject line was the kind that makes you sit up. They had detected a distributed attack on one of our CloudFront distributions, lasting about 45 minutes, against a high-traffic content platform we operate. The attack had since subsided, but they had attached a rule.json containing a WAF rule they recommended we apply to block the traffic if it came back.

The rule was simple: block any request matching a specific JA4 fingerprint.

{
  "Name": "block-ja4-AMSSEC-XXXXX",
  "Statement": {
    "ByteMatchStatement": {
      "SearchString": "t13d1517h2_8daaf6152771_b6f405a00624",
      "FieldToMatch": { "JA4Fingerprint": { "FallbackBehavior": "NO_MATCH" } },
      "TextTransformations": [{ "Priority": 0, "Type": "NONE" }],
      "PositionalConstraint": "EXACTLY"
    }
  },
  "Action": { "Block": {} }
}

It came from AWS. Our site had been attacked. The obvious move was to apply it.

We didn't. And after a week of investigation, AWS Support eventually agreed we shouldn't.

This post is the story of what happened in between.


A quick primer on JA4 fingerprints

If you've never come across JA4 before, let me quickly mention that it's a hash of the TLS Client Hello a connecting client sends to your server. Different TLS clients (Chrome on Windows, curl, Python's requests, a botnet's custom client) produce different Client Hello shapes: different cipher suites in different orders, different extensions, different ALPN values. JA4 turns those differences into a fingerprint string you can match in your WAF.

The format looks like this:

t13d1517h2_8daaf6152771_b6f405a00624
└─┬─────┘ └──────┬─────┘ └──────┬─────┘
  │              │               │
  │              │               └── Hash of extensions and signature algorithms
  │              └── Hash of cipher suites
  └── TLS metadata: protocol version, ciphers count, ALPN, etc.
       t13 = TLS 1.3
       d   = destination/server
       15  = 15 cipher suites
       17  = 17 extensions
       h2  = HTTP/2 ALPN

JA4 is genuinely useful for blocking custom-built attack clients, because attackers rarely take the time to make their tooling fingerprint-identical to a real browser. But it has a known weakness: MODERN BROWSERS ALL LOOK VERY SIMILAR AT THE TLS LAYER, and so do many popular HTTP libraries. The same JA4 fingerprint can match Chrome, Edge, Firefox, and Postman simultaneously. That's the trap we were about to walk into.


Step one: deploy in Count mode

Except for extreme, urgent and known issue, one thing I have learnt is that before flipping any new WAF rule to Block, always deploy it in Count mode first. Count mode tells WAF to log every request the rule would have blocked, without actually blocking anything. It's free, harmless, and tells you whether the rule does what you think it does.

We applied AWS's rule scoped to the host header of the affected site, with Action: Count:

- Action:
    Count: {}
  Name: ja4-count-AMSSEC-XXXXX
  Priority: 8
  Statement:
    AndStatement:
      Statements:
        - ByteMatchStatement:
            FieldToMatch:
              SingleHeader:
                Name: host
            PositionalConstraint: EXACTLY
            SearchString: app.example.com
            TextTransformations:
              - Type: NONE
                Priority: 0
        - ByteMatchStatement:
            SearchString: t13d1517h2_8daaf6152771_b6f405a00624
            FieldToMatch:
              JA4Fingerprint:
                FallbackBehavior: NO_MATCH
            TextTransformations:
              - Priority: 0
                Type: NONE
            PositionalConstraint: EXACTLY
  VisibilityConfig:
    SampledRequestsEnabled: true
    CloudWatchMetricsEnabled: true
    MetricName: count-ja4-AMSSEC-XXXXX

Within 24 hours, the CountedRequests metric on this rule had crossed 17 million.

That's when things started feeling off.


Three findings that killed the rule

We ran a series of Athena queries against the WAF logs to break down what was actually matching.

Finding 1: The fingerprint matches normal browser traffic

SELECT
  httprequest.clientip,
  httprequest.country,
  httprequest.uri,
  httprequest.httpmethod,
  count(*) AS request_count
FROM wafLogTable
WHERE year = 'xxxx'
  AND month = 'xx'
  AND day IN ('xx', 'xx')
  AND ja4fingerprint = 't13d1517h2_8daaf6152771_b6f405a00624'
GROUP BY 1, 2, 3, 4
ORDER BY request_count DESC
LIMIT 100;

The results showed traffic from across legitimate location, hitting normal user-facing URLs, with a wide spread of source IPs — most of them sending only a handful of requests each. This is what legitimate traffic to a popular public-facing site looks like. Decoding the JA4 itself confirmed the suspicion: t13d1517h2 is TLS 1.3 with HTTP/2, which is the default for modern Chrome, Edge, and Firefox.

In other words, the fingerprint matched a property of "Chrome on Windows," not a property of "the attacker."

Finding 2: The fingerprint predates the attack

If a fingerprint is genuinely tied to an attack, you'd expect to see it appear or spike around the attack window. We checked the 5 days before the reported attack date:

SELECT day, count(*) AS request_count
FROM wafLogTable
WHERE year = 'xxxx'
  AND month = 'xx'
  AND day IN ('xx', 'xx', 'xx', 'xx', 'xx')
  AND ja4fingerprint = 't13d1517h2_8daaf6152771_b6f405a00624'
GROUP BY day
ORDER BY day ASC;

The daily counts in the quiet period before the attack were already in the millions. This fingerprint had been present in our baseline traffic for as long as we could see in the logs. It wasn't introduced by the attackers. They just happened to use a TLS stack that produced the same fingerprint as every Chrome user on the internet.

Finding 3: We were about to block our own monitoring

This was the part that made me actually stop and re-read the query. Among the IPs matching the fingerprint, three stood out because they were sending steady, low-volume, evenly-paced requests from a specific public range. A quick lookup against AWS's published IP ranges identified them as the public probes used by one of AWS's own synthetic monitoring services, which our team uses to detect uptime issues on the platform.

So if we'd flipped the rule to Block, the first thing we would have blocked was AWS's monitoring of our own site. Every health check would have started failing. Our pager would have lit up. And we would have spent the next hour debugging "why is the site down?" while the site was, in fact, completely fine.


Going back to AWS

At this point we had enough evidence to push back. I wrote up the findings and sent them back to the AWS Security engineer who had originally proposed the rule:

Thank you for the details provided regarding the DDoS event on our CloudFront distribution.

We have applied the recommended WAF rule in Count mode as advised, and have conducted a thorough investigation of the traffic matching the provided JA4 fingerprint. Unfortunately, we are unable to proceed with blocking based on this indicator alone. Here are our findings:

  1. The fingerprint is too broad. The JA4 corresponds to a standard TLS 1.3 / HTTP/2 configuration used by modern browsers (Chrome, Edge, Firefox) and common HTTP client libraries. Over a 24-hour period, we observed over 17 million requests matching this fingerprint, the vast majority of which are legitimate user traffic.

  2. The fingerprint predates the attack. Analysis of our WAF logs confirms that this fingerprint was already present in our normal baseline traffic from at least 5 days before the reported attack, coming from a wide range of IPs across multiple European countries.

  3. Our own AWS infrastructure matches this fingerprint. Our synthetic monitoring probes also match this fingerprint. Blocking on this rule would disrupt our own site availability monitoring.

Could you please provide a more specific indicator to isolate the attack traffic? For example: a combination of JA4 fingerprint + specific source IP ranges or ASNs, a more unique fingerprint, or URI patterns and request rate thresholds specific to the attack.

A few days later, an AWS WAF Support specialist took over the case. Her reply was honest: the original recommendation had come from the Security Incident Response team's own traffic analysis during the attack window, but they didn't have additional indicators to share. And since several days had passed, the attackers had likely rotated their technique anyway. A static fingerprint block would now be both too broad (catching legitimate traffic) and too narrow (an attacker today would look different).

What she recommended instead was much more useful:

  1. Tighten the existing managed rules. Specifically, set AWSManagedIPDDoSList inside the AWSManagedRulesAmazonIpReputationList group from Count to Block. This list is maintained by AWS threat intelligence and targets IPs actively participating in DDoS activity right now — it self-updates, unlike a static fingerprint.
  2. Lower the rate-based rule thresholds. Our existing per-IP limit was xxx,xxx requests per 5 minutes, which is far too permissive against a distributed attack. Real human users on this platform rarely exceed a few hundred requests in that window.
  3. Increase Anti-DDoS sensitivity from LOW to MEDIUM. We had been running it at the most permissive level.
  4. Enable AWS WAF Bot Control at Targeted inspection level. This adds active detection for coordinated automated traffic, which is most of what a distributed attack looks like.
  5. Set HostingProviderIPList to Challenge. Most legitimate users don't come from datacenter IP ranges. A CAPTCHA challenge filters out automated traffic from cloud and VPS providers without blocking the rare real user behind one.

We deployed those changes over the following week, with each rule going through the same Count → validate → Block cycle. None of them would have broken our monitoring.


What I took away from this

A few things worth writing down so I remember them next time.

Count mode is non-negotiable. Every new WAF rule, every time, no exceptions, even when the rule came from AWS themselves. The cost of a day in Count mode is logging volume. The cost of a bad Block rule is your site going down for as long as it takes you to realize YOU did it, not the attackers.

A vendor recommendation is an input, not an instruction. AWS Security wasn't wrong to send us the rule — they're acting on their own traffic data and trying to help. But the data they see during a 45-minute attack window is much less than the data we see in our logs across days and weeks. The party with more context has to do the validation work. That's us.

Static indicators decay fast. A JA4 fingerprint, an IP list, a specific URI pattern — these are all snapshots of how an attacker behaved at one point in time. They're useful for the next 24 hours, sometimes less. Dynamic protections (managed rule groups that self-update, rate-based rules, behavioral challenges) are what actually carry weight over time. We had been under-using them.

Block your own monitoring once, and you'll never forget to check again. If we had skipped Count mode, the first metric to go red after deploying the Block rule would have been our own synthetic checks. Which we would naturally have assumed meant the site was actually down. The detection-and-response confusion alone would have cost us at least an hour before anyone thought to look at WAF logs.

The DDoS attack itself, in the end, was the easy part. AWS Shield absorbed most of it. The harder part was responding to the response — making sure that whatever we put in place to "prevent it next time" didn't quietly become a bigger outage than the attack ever was.