David Bethel Posted January 14, 2012 Posted January 14, 2012 Has anyone come up with a wcmatch search for characters that html considers special? ie & & for ampersand ? Just so you don't get warning when validating a web page: (setq s1 "Just You & Me) (wcmatch s1 ..... ) would return T and / or replace the ampersand in the string with "&" http://www.w3schools.com/tags/ref_entities.asp -David Quote
Lee Mac Posted January 14, 2012 Posted January 14, 2012 This is what I've used in the past: (defun _StringSubst ( new old string / l i ) (setq l (strlen new) i 0 ) (while (setq i (vl-string-search old string i)) (setq string (vl-string-subst new old string i) i (+ i l)) ) string ) (defun _ReplaceEntRefs ( string ) (foreach pair '( ("&" . "&") ("<" . "<") (">" . ">") ("'" . "'") (""" . "\"") ) (setq string (_StringSubst (car pair) (cdr pair) string)) ) string ) Quote
Lee Mac Posted January 14, 2012 Posted January 14, 2012 Or, in pure Vanilla: (defun _ReplaceEntRefs ( str / out sub ) (setq out "") (if (wcmatch str "*&*,*<*,*>*,*'*,*\"*") (repeat (strlen str) (setq sub (substr str 1 1) str (substr str 2) ) (setq out (strcat out (cond ( (cdr (assoc sub '( ("&" . "&") ("<" . "<") (">" . ">") ("'" . "'") ("\"" . """) ) ) ) ) ( sub ) ) ) ) ) str ) ) And another Visual: (defun _ReplaceEntRefs ( str ) (if (wcmatch str "*&*,*<*,*>*,*'*,*\"*") (vl-list->string (apply 'append (mapcar (function (lambda ( c ) (cond ( (cdr (assoc c '( (38 38 97 109 112 59) (60 38 108 116 59) (62 38 103 116 59) (39 38 97 112 111 115 59) (34 38 113 117 111 116 59) ) ) ) ) ( (list c) ) ) ) ) (vl-string->list str) ) ) ) str ) ) Quote
Lee Mac Posted January 14, 2012 Posted January 14, 2012 A quick speed test: _$ (setq s "'<a&b&c>'\"") "'<a&b&c>'\"" _$ (repeat 5 (setq s (strcat s s))) "'<a&b&c>'\"'<a&b&c>'\"'<a&b&c>'\"'<a&b&c>'\" ... "'<a&b&c>'\"" _$ (strlen s) 320 Numbering the functions in the order they are posted: _$ (Benchmark '((_ReplaceEntRefs1 s) (_ReplaceEntRefs2 s) (_ReplaceEntRefs3 s))) Benchmarking ...............Elapsed milliseconds / relative speed for 4096 iteration(s): (_REPLACEENTREFS3 S).....1965 / 4.41 <fastest> (_REPLACEENTREFS1 S).....2667 / 3.25 (_REPLACEENTREFS2 S).....8658 / 1 <slowest> Quote
David Bethel Posted January 14, 2012 Author Posted January 14, 2012 Thanks Lee, I guess it begs to ask, 'What If' & is already in the string: (setq nl "Just You & Me") (prin1 (_ReplaceEntRefs nl)) Thanks again! -David Quote
Lee Mac Posted January 15, 2012 Posted January 15, 2012 I guess it begs to ask, 'What If' & is already in the string At that stage I would be inclined to utilise Regular Expressions to perform the replacement, since a negative-lookahead ('?!') can be used in the ampersand pattern string: [color=GREEN];; Add Character Entity References - Lee Mac[/color] [color=GREEN];; Replaces HTML Special Characters with their character entity reference equivalents[/color] ([color=BLUE]defun[/color] LM:AddCharEntRefs ( str [color=BLUE]/[/color] err regex ) ([color=BLUE]if[/color] ([color=BLUE]setq[/color] regex ([color=BLUE]vlax-get-or-create-object[/color] [color=MAROON]"VBScript.RegExp"[/color])) ([color=BLUE]progn[/color] ([color=BLUE]setq[/color] err ([color=BLUE]vl-catch-all-apply[/color] ([color=BLUE]function[/color] ([color=BLUE]lambda[/color] ( ) ([color=BLUE]vlax-put-property[/color] regex 'global [color=BLUE]:vlax-true[/color]) ([color=BLUE]vlax-put-property[/color] regex 'ignorecase [color=BLUE]:vlax-false[/color]) ([color=BLUE]vlax-put-property[/color] regex 'multiline [color=BLUE]:vlax-true[/color]) ([color=BLUE]foreach[/color] pair '( ([color=MAROON]"&(?!amp;|lt;|gt;|apos;|quot;)"[/color] . [color=MAROON]"&"[/color]) ([color=MAROON]"<"[/color] . [color=MAROON]"<"[/color]) ([color=MAROON]">"[/color] . [color=MAROON]">"[/color]) ([color=MAROON]"'"[/color] . [color=MAROON]"'"[/color]) ([color=MAROON]"\""[/color] . [color=MAROON]"""[/color]) ) ([color=BLUE]vlax-put-property[/color] regex 'pattern ([color=BLUE]car[/color] pair)) ([color=BLUE]setq[/color] str ([color=BLUE]vlax-invoke[/color] regex 'replace str ([color=BLUE]cdr[/color] pair))) ) ) ) ) ) ([color=BLUE]vlax-release-object[/color] regex) ([color=BLUE]if[/color] ([color=BLUE]vl-catch-all-error-p[/color] err) ([color=BLUE]prompt[/color] ([color=BLUE]vl-catch-all-error-message[/color] err)) err ) ) ) ) _$ (LM:AddCharEntRefs [color=darkred]""a & b && <>\""[/color]) [color=darkred]""a & b && <>""[/color] I've also renamed the function to better describe the purpose it is serving, since we are adding the character entity references, not replacing them, so the naming of my original functions may have been misleading. Lee Quote
David Bethel Posted January 15, 2012 Author Posted January 15, 2012 Lee, thanks again! Another possibility could be: (if (and (not (wcmatch str "*&*")) (wcmatch str "*&*,*<*,*>*,*'*,*\"*")) or (if (and (wcmatch str "*~*&*") (wcmatch str "*&*,*<*,*>*,*'*,*\"*")) I was never very good using wcmatch and the brackets [ ] testing. I was guessing that it would be useful here but doesn't look to be. -David Quote
Lee Mac Posted January 15, 2012 Posted January 15, 2012 Another possibility could be... Not quite, since for a string: "a & b & c" The '&' would not be replaced; also, you would need to check for '"', '', ''. But your suggestion with using brackets in the wcmatch expression offers an improvement to my above function: [color=GREEN];; Add Character Entity References - Lee Mac[/color] [color=GREEN];; Replaces HTML Special Characters with their character entity reference equivalents[/color] ([color=BLUE]defun[/color] LM:AddCharEntRefs ( str [color=BLUE]/[/color] err regex ) ([color=BLUE]if[/color] ([color=BLUE]wcmatch[/color] str [color=MAROON]"*[&<>'\"]*"[/color]) ([color=BLUE]if[/color] ([color=BLUE]setq[/color] regex ([color=BLUE]vlax-get-or-create-object[/color] [color=MAROON]"VBScript.RegExp"[/color])) ([color=BLUE]progn[/color] ([color=BLUE]setq[/color] err ([color=BLUE]vl-catch-all-apply[/color] ([color=BLUE]function[/color] ([color=BLUE]lambda[/color] ( ) ([color=BLUE]vlax-put-property[/color] regex 'global [color=BLUE]:vlax-true[/color]) ([color=BLUE]vlax-put-property[/color] regex 'ignorecase [color=BLUE]:vlax-false[/color]) ([color=BLUE]vlax-put-property[/color] regex 'multiline [color=BLUE]:vlax-true[/color]) ([color=BLUE]foreach[/color] pair '( ([color=MAROON]"&(?!amp;|lt;|gt;|apos;|quot;)"[/color] . [color=MAROON]"&"[/color]) ([color=MAROON]"<"[/color] . [color=MAROON]"<"[/color]) ([color=MAROON]">"[/color] . [color=MAROON]">"[/color]) ([color=MAROON]"'"[/color] . [color=MAROON]"'"[/color]) ([color=MAROON]"\""[/color] . [color=MAROON]"""[/color]) ) ([color=BLUE]vlax-put-property[/color] regex 'pattern ([color=BLUE]car[/color] pair)) ([color=BLUE]setq[/color] str ([color=BLUE]vlax-invoke[/color] regex 'replace str ([color=BLUE]cdr[/color] pair))) ) ) ) ) ) ([color=BLUE]vlax-release-object[/color] regex) ([color=BLUE]if[/color] ([color=BLUE]vl-catch-all-error-p[/color] err) ([color=BLUE]prompt[/color] ([color=BLUE]vl-catch-all-error-message[/color] err)) err ) ) ) str ) ) Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.