Jump to content

wcmatch for html special characters


Recommended Posts

Posted

Has anyone come up with a wcmatch search for characters that html considers special?

 

ie & & for ampersand ?

 

Just so you don't get warning when validating a web page:

 

(setq s1 "Just You & Me)

(wcmatch s1  ..... )

 

would return T and / or replace the ampersand in the string with "&"

 

 

http://www.w3schools.com/tags/ref_entities.asp

 

 

-David

Posted

This is what I've used in the past:

 

(defun _StringSubst ( new old string / l i )
   (setq l (strlen new)
         i 0
   )
   (while (setq i (vl-string-search old string i))
       (setq string (vl-string-subst new old string i) i (+ i l))
   )
   string
)

(defun _ReplaceEntRefs ( string )
   (foreach pair
      '(
           ("&"  .  "&")
           ("<"   .  "<")
           (">"   .  ">")
           ("'" .  "'")
           (""" . "\"")
       )
       (setq string (_StringSubst (car pair) (cdr pair) string))
   )
   string
)

Posted

Or, in pure Vanilla:

 

(defun _ReplaceEntRefs ( str / out sub )
   (setq out "")
   (if (wcmatch str "*&*,*<*,*>*,*'*,*\"*")
       (repeat (strlen str)
           (setq sub (substr str 1 1)
                 str (substr str 2)
           )
           (setq out
               (strcat out
                   (cond
                       (
                           (cdr
                               (assoc sub
                                  '(
                                       ("&"  . "&")
                                       ("<"  . "<")
                                       (">"  . ">")
                                       ("'"  . "'")
                                       ("\"" . """)
                                   )
                               )
                           )
                       )
                       (   sub   )
                   )
               )
           )
       )
       str
   )
)

 

And another Visual:

 

(defun _ReplaceEntRefs ( str )
   (if (wcmatch str "*&*,*<*,*>*,*'*,*\"*")
       (vl-list->string
           (apply 'append
               (mapcar
                   (function
                       (lambda ( c )
                           (cond
                               (
                                   (cdr
                                       (assoc c
                                          '(
                                               (38 38  97 109 112 59)
                                               (60 38 108 116 59)
                                               (62 38 103 116 59)
                                               (39 38  97 112 111 115 59)
                                               (34 38 113 117 111 116 59)
                                           )
                                       )
                                   )
                               )
                               (   (list c)   )
                           )
                       )
                   )
                   (vl-string->list str)
               )
           )
       )
       str
   )
)

Posted

A quick speed test:

 

_$ (setq s "'<a&b&c>'\"")
"'<a&b&c>'\""
_$ (repeat 5 (setq s (strcat s s)))
"'<a&b&c>'\"'<a&b&c>'\"'<a&b&c>'\"'<a&b&c>'\"   ...   "'<a&b&c>'\""
_$ (strlen s)
320

 

Numbering the functions in the order they are posted:

 

_$ (Benchmark '((_ReplaceEntRefs1 s) (_ReplaceEntRefs2 s) (_ReplaceEntRefs3 s)))
Benchmarking ...............Elapsed milliseconds / relative speed for 4096 iteration(s):

   (_REPLACEENTREFS3 S).....1965 / 4.41 <fastest>
   (_REPLACEENTREFS1 S).....2667 / 3.25
   (_REPLACEENTREFS2 S).....8658 / 1 <slowest>

Posted

Thanks Lee,

 

 

I guess it begs to ask, 'What If' & is already in the string:

 

(setq nl "Just You & Me")

(prin1 (_ReplaceEntRefs nl))

 

Thanks again! -David

Posted
I guess it begs to ask, 'What If' & is already in the string

 

At that stage I would be inclined to utilise Regular Expressions to perform the replacement, since a negative-lookahead ('?!') can be used in the ampersand pattern string:

 

[color=GREEN];; Add Character Entity References  -  Lee Mac[/color]
[color=GREEN];; Replaces HTML Special Characters with their character entity reference equivalents[/color]

([color=BLUE]defun[/color] LM:AddCharEntRefs ( str [color=BLUE]/[/color] err regex )
   ([color=BLUE]if[/color] ([color=BLUE]setq[/color] regex ([color=BLUE]vlax-get-or-create-object[/color] [color=MAROON]"VBScript.RegExp"[/color]))
       ([color=BLUE]progn[/color]
           ([color=BLUE]setq[/color] err
               ([color=BLUE]vl-catch-all-apply[/color]
                   ([color=BLUE]function[/color]
                       ([color=BLUE]lambda[/color] ( )
                           ([color=BLUE]vlax-put-property[/color] regex 'global     [color=BLUE]:vlax-true[/color])
                           ([color=BLUE]vlax-put-property[/color] regex 'ignorecase [color=BLUE]:vlax-false[/color])
                           ([color=BLUE]vlax-put-property[/color] regex 'multiline  [color=BLUE]:vlax-true[/color])
                           ([color=BLUE]foreach[/color] pair
                              '(
                                   ([color=MAROON]"&(?!amp;|lt;|gt;|apos;|quot;)"[/color] . [color=MAROON]"&"[/color])
                                   ([color=MAROON]"<"[/color]  . [color=MAROON]"<"[/color])
                                   ([color=MAROON]">"[/color]  . [color=MAROON]">"[/color])
                                   ([color=MAROON]"'"[/color]  . [color=MAROON]"'"[/color])
                                   ([color=MAROON]"\""[/color] . [color=MAROON]"""[/color])
                               )
                               ([color=BLUE]vlax-put-property[/color] regex 'pattern ([color=BLUE]car[/color] pair))
                               ([color=BLUE]setq[/color] str ([color=BLUE]vlax-invoke[/color] regex 'replace str ([color=BLUE]cdr[/color] pair)))
                           )
                       )
                   )
               )
           )
           ([color=BLUE]vlax-release-object[/color] regex)
           ([color=BLUE]if[/color] ([color=BLUE]vl-catch-all-error-p[/color] err)
               ([color=BLUE]prompt[/color] ([color=BLUE]vl-catch-all-error-message[/color] err))
               err
           )
       )
   )
)

_$ (LM:AddCharEntRefs [color=darkred]""a & b && <>\""[/color])
[color=darkred]""a & b && <>""[/color]

I've also renamed the function to better describe the purpose it is serving, since we are adding the character entity references, not replacing them, so the naming of my original functions may have been misleading.

 

Lee

Posted

Lee, thanks again!

 

Another possibility could be:

 

(if (and (not (wcmatch str "*&*"))
             (wcmatch str "*&*,*<*,*>*,*'*,*\"*"))

 

or

(if (and (wcmatch str "*~*&*")
        (wcmatch str "*&*,*<*,*>*,*'*,*\"*"))

 

I was never very good using wcmatch and the brackets [ ] testing. I was guessing that it would be useful here but doesn't look to be. -David

Posted
Another possibility could be...

 

Not quite, since for a string:

 

"a & b & c"

The '&' would not be replaced; also, you would need to check for '"', '', ''. ;)

 

But your suggestion with using brackets in the wcmatch expression offers an improvement to my above function:

 

[color=GREEN];; Add Character Entity References  -  Lee Mac[/color]
[color=GREEN];; Replaces HTML Special Characters with their character entity reference equivalents[/color]

([color=BLUE]defun[/color] LM:AddCharEntRefs ( str [color=BLUE]/[/color] err regex )
   ([color=BLUE]if[/color] ([color=BLUE]wcmatch[/color] str [color=MAROON]"*[&<>'\"]*"[/color])
       ([color=BLUE]if[/color] ([color=BLUE]setq[/color] regex ([color=BLUE]vlax-get-or-create-object[/color] [color=MAROON]"VBScript.RegExp"[/color]))
           ([color=BLUE]progn[/color]
               ([color=BLUE]setq[/color] err
                   ([color=BLUE]vl-catch-all-apply[/color]
                       ([color=BLUE]function[/color]
                           ([color=BLUE]lambda[/color] ( )
                               ([color=BLUE]vlax-put-property[/color] regex 'global     [color=BLUE]:vlax-true[/color])
                               ([color=BLUE]vlax-put-property[/color] regex 'ignorecase [color=BLUE]:vlax-false[/color])
                               ([color=BLUE]vlax-put-property[/color] regex 'multiline  [color=BLUE]:vlax-true[/color])
                               ([color=BLUE]foreach[/color] pair
                                  '(
                                       ([color=MAROON]"&(?!amp;|lt;|gt;|apos;|quot;)"[/color] . [color=MAROON]"&"[/color])
                                       ([color=MAROON]"<"[/color]  . [color=MAROON]"<"[/color])
                                       ([color=MAROON]">"[/color]  . [color=MAROON]">"[/color])
                                       ([color=MAROON]"'"[/color]  . [color=MAROON]"'"[/color])
                                       ([color=MAROON]"\""[/color] . [color=MAROON]"""[/color])
                                   )
                                   ([color=BLUE]vlax-put-property[/color] regex 'pattern ([color=BLUE]car[/color] pair))
                                   ([color=BLUE]setq[/color] str ([color=BLUE]vlax-invoke[/color] regex 'replace str ([color=BLUE]cdr[/color] pair)))
                               )
                           )
                       )
                   )
               )
               ([color=BLUE]vlax-release-object[/color] regex)
               ([color=BLUE]if[/color] ([color=BLUE]vl-catch-all-error-p[/color] err)
                   ([color=BLUE]prompt[/color] ([color=BLUE]vl-catch-all-error-message[/color] err))
                   err
               )
           )
       )
       str
   )
)

Posted

Good catch! Thanks! -David

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...