Closed
Bug 838588
Opened 12 years ago
Closed 9 years ago
REGEXP incorrectly working on large strings (> 999997)
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: register, Unassigned)
References
Details
(Keywords: dataloss, regression, testcase)
Attachments
(4 files)
User Agent: Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0
Build ID: 20130109164812
Steps to reproduce:
any usage of regexp /(?:\"([^\"]*)\")/g
on data from attachment
eg. simply try it on http://www.pcre.ru/eval
Actual results:
none match
Expected results:
any other browser (eg. Chrome) matches with part in (")
Reporter | ||
Comment 1•12 years ago
|
||
also try this regexp on same data
/(?:\'((?:\\.|[^\'])*?)\')/g
Updated•12 years ago
|
Assignee: nobody → general
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
I can't reproduce this using Firefox 19 beta. The regexp finds 3 matches in the attachment, as expected:
1. Load https://bugzilla.mozilla.org/attachment.cgi?id=710672
2. In the web console, run
document.querySelector("pre").innerHTML.match(/(?:\"([^\"]*)\")/g).length
Result: 3
The regexp in comment 1 returns no matches, but it does in Chrome too.
Reporter | ||
Comment 4•12 years ago
|
||
above regexp is the part of large tinyMCE's regexp that should parse HTML tag in parts. And i found that in firefox it fails to work on large data:base64 images - but Chrome & Opera works fine.
Just drag&drop image near 1mb or more to tinyMCE's window under Firefox and then try to tinyMce.triggerSave() - in case of error IMG in TEXTAREA will not contain 'src=' at all. But with small images all works fine.
comment's 1 regexp in action, screenshot in attachment
full tinyMCE's regexp
([\w:\-]+)(?:\s*=\s*(?:(?:\"((?:\\.|[^\"])*)\")|(?:\'((?:\\.|[^\'])*?)\')|([^>\s]+)))?
The regex in...
- comment 1 doesn't match and shouldn't.
- comment 2 does match and should.
- comment 4 probably ought to be
/([-\w:]+)(?:\s*=\s*(?:"([^"]*)"|'([^']*)'|([^>\s]+)))?/g
- which does match. But the original, however horrible, ought to match too. And you can see it eventually does if you start reducing the size of the base64 data. In Scratchpad:
var s = document.querySelector("pre").innerHTML;
while (!/([\w:\-]+)(?:\s*=\s*(?:(?:\"((?:\\.|[^\"])*)\")|(?:\'((?:\\.|[^\'])*?)\')|([^>\s]+)))?/.test(s)) {
// chop 500 chars off the start of the src attr:
s = s.replace(/(src=").{500}/g, "$1");
}
alert(s.length); // 999645
I attach a test case to show how regexes with some combinations of grouping, quantifiers and '$' fail to match strings above about a million characters.
That's interesting. This test case is based on the one in comment 6, but displays the results (colour-coded) in an HTML table so you can easily see the string lengths at which the various regexps fail.
Testing on current nightly, I get red 'false' results in rows 2, 3, 7, 8 and 10.
Attachment #720221 -
Attachment mime type: text/plain → text/html
Regression range using the testcase in comment 7 is
Last good nightly: 2010-08-13
First bad nightly: 2010-08-14
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d5e211bdd793&tochange=656d99ca089c
In the 2010-08-13 nightly the regexps in rows 8 and 10 throw an exception rather than incorrectly return false, but the others all return the expected result.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression,
testcase
OS: Linux → All
Summary: REGEXP incorrectly working on large strings → REGEXP incorrectly working on large strings (> 999997)
Version: 18 Branch → Trunk
Comment 10•12 years ago
|
||
This due to an artificial limit (|matchLimit = 1000000|) in:
http://hg.mozilla.org/mozilla-central/file/7e729e2c3822/js/src/yarr/Yarr.h#l51
So it's a "bug" in YARR (Webkit/JavaScriptCore). As such Safari fails the same.
Relevant bugs:
https://bugs.webkit.org/show_bug.cgi?id=25071
Comment 11•12 years ago
|
||
As this *silently* causes matches to be empty (e.g. http://jsfiddle.net/JKb2A/1/ and in TinyMCE I suppose), this is bound to cause dataloss in real world webapps.
So, Luke, any chance to get this prioritized so that it will at the very least throw an exception or something?
Flags: needinfo?(luke)
Keywords: dataloss
See Also: → https://bugs.webkit.org/show_bug.cgi?id=25071
![]() |
||
Comment 12•12 years ago
|
||
Yes, it seems like we should throw "allocation size overflow" error if we hit some internal limit like this. I'm not familiar with the regexp code, though, perhaps Naveed could help get someone on this?
Flags: needinfo?(luke)
Assignee | ||
Updated•11 years ago
|
Assignee: general → nobody
Comment 13•9 years ago
|
||
No longer reproducible, probably fixed when we replaced YARR with irregexp (bug 976446). Resolving as WFM.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•