Closed Bug 838588 Opened 12 years ago Closed 9 years ago

REGEXP incorrectly working on large strings (> 999997)

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: register, Unassigned)

References

Details

(Keywords: dataloss, regression, testcase)

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0 Build ID: 20130109164812 Steps to reproduce: any usage of regexp /(?:\"([^\"]*)\")/g on data from attachment eg. simply try it on http://www.pcre.ru/eval Actual results: none match Expected results: any other browser (eg. Chrome) matches with part in (")
also try this regexp on same data /(?:\'((?:\\.|[^\'])*?)\')/g
Assignee: nobody → general
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
I can't reproduce this using Firefox 19 beta. The regexp finds 3 matches in the attachment, as expected: 1. Load https://bugzilla.mozilla.org/attachment.cgi?id=710672 2. In the web console, run document.querySelector("pre").innerHTML.match(/(?:\"([^\"]*)\")/g).length Result: 3 The regexp in comment 1 returns no matches, but it does in Chrome too.
screenshot of comment 1's regexp in chrome
above regexp is the part of large tinyMCE's regexp that should parse HTML tag in parts. And i found that in firefox it fails to work on large data:base64 images - but Chrome & Opera works fine. Just drag&drop image near 1mb or more to tinyMCE's window under Firefox and then try to tinyMce.triggerSave() - in case of error IMG in TEXTAREA will not contain 'src=' at all. But with small images all works fine. comment's 1 regexp in action, screenshot in attachment full tinyMCE's regexp ([\w:\-]+)(?:\s*=\s*(?:(?:\"((?:\\.|[^\"])*)\")|(?:\'((?:\\.|[^\'])*?)\')|([^>\s]+)))?
The regex in... - comment 1 doesn't match and shouldn't. - comment 2 does match and should. - comment 4 probably ought to be /([-\w:]+)(?:\s*=\s*(?:"([^"]*)"|'([^']*)'|([^>\s]+)))?/g - which does match. But the original, however horrible, ought to match too. And you can see it eventually does if you start reducing the size of the base64 data. In Scratchpad: var s = document.querySelector("pre").innerHTML; while (!/([\w:\-]+)(?:\s*=\s*(?:(?:\"((?:\\.|[^\"])*)\")|(?:\'((?:\\.|[^\'])*?)\')|([^>\s]+)))?/.test(s)) { // chop 500 chars off the start of the src attr: s = s.replace(/(src=").{500}/g, "$1"); } alert(s.length); // 999645 I attach a test case to show how regexes with some combinations of grouping, quantifiers and '$' fail to match strings above about a million characters.
That's interesting. This test case is based on the one in comment 6, but displays the results (colour-coded) in an HTML table so you can easily see the string lengths at which the various regexps fail. Testing on current nightly, I get red 'false' results in rows 2, 3, 7, 8 and 10.
Attachment #720221 - Attachment mime type: text/plain → text/html
Regression range using the testcase in comment 7 is Last good nightly: 2010-08-13 First bad nightly: 2010-08-14 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d5e211bdd793&tochange=656d99ca089c In the 2010-08-13 nightly the regexps in rows 8 and 10 throw an exception rather than incorrectly return false, but the others all return the expected result.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression, testcase
OS: Linux → All
Summary: REGEXP incorrectly working on large strings → REGEXP incorrectly working on large strings (> 999997)
Version: 18 Branch → Trunk
This due to an artificial limit (|matchLimit = 1000000|) in: http://hg.mozilla.org/mozilla-central/file/7e729e2c3822/js/src/yarr/Yarr.h#l51 So it's a "bug" in YARR (Webkit/JavaScriptCore). As such Safari fails the same. Relevant bugs: https://bugs.webkit.org/show_bug.cgi?id=25071
As this *silently* causes matches to be empty (e.g. http://jsfiddle.net/JKb2A/1/ and in TinyMCE I suppose), this is bound to cause dataloss in real world webapps. So, Luke, any chance to get this prioritized so that it will at the very least throw an exception or something?
Flags: needinfo?(luke)
Keywords: dataloss
Yes, it seems like we should throw "allocation size overflow" error if we hit some internal limit like this. I'm not familiar with the regexp code, though, perhaps Naveed could help get someone on this?
Flags: needinfo?(luke)
Assignee: general → nobody
No longer reproducible, probably fixed when we replaced YARR with irregexp (bug 976446). Resolving as WFM.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: