Im a spider so what

12/17/2023

I couldn't find pcretest, but found pcre2test in Ubuntu. They achieve |-alternatives without backtracking, purely via state machines I think. I reran with the new regex, and it avereged 5.171 ms - suprisingly lower but close enough to be a measurement error. On my computer with Nginx+ModSecurity I ran a couple of thousand runs of a simple 520 byte form-data/multipart POST non-keepalive request with a file field. No requests will be 9x slower, but on those specific form-data/multipart POST requests that also contain a file name field, that one operation to scan the name and file name field that used to take ModSecurity around 0.001633 ms now is now 9x slower at around 0.01467 ms. It's probably closer to 0.13%, and only on very specific requests. Thinking of it as a 900% perf penalty I feel is kind of blunt. I feel like there may be a way with chained rules, but just haven't been able to find one. I wish there was a way to achieve this without such a long, complex, and slow regex.

The reason I thought I'd share it anyway, was because it may have a long term benefit, as it paves the way for CRS on future WAF engines that strictly avoid PCRE (like the experiment I'm currently prototyping). I know it doesn't have any short term benefit for anyone besides me. Yea I totally get if CRS doesn't want to merge this PR. The first semicolon may be preceeded by ä, thereby only evaluating the second chained rule for this TX:1, ignoring subsequent semicolons which may contain the attack.

Unfortunately this fails when the input has multiple semicolons. The second rule in the chain could then be a negative check, something like SecRule TX:1 (¨|&circ|.)". The first rule in the chain could catch something like (&+)?. One alternative approach I've explored but failed at, is to split the rule into multiple chained rules. If anyone has any ideas for a more readable approach, I'd be very eager to hear. This is a tough tradeoff in human readability of the rule. I have described the approach in detail here, and posted a Python script I used to generate the final regex: The approach of my solution is to write alternatives for each possible prefix that is not ¨, &circ, etc. The goal of the original regex was to disallow semicolon, except if it was preceeded by some strings that make it a well known HTML entity, such as ä, â, etc. Others who are also doing similar experiments with non-PCRE regex engines will benefit too. I'm hoping to be able to keep using vanilla CRS, so this is why I'm hoping to get this accepted in upstream. My motivation for avoiding this PCRE-specific construct is that I am working on an experimental WAF that uses CRS, but uses Hyperscan and Go's regexp package rather than PCRE. Negative lookbehinds are not supported by many regex engines (RE2, Go's regexp, Hyperscan).

0 Comments

Im a spider so what

Leave a Reply.

Author

Archives

Categories