Jump to content

Fastest method to compare list of substrings against strings and return unique output?


Recommended Posts

Posted (edited)

On a high level:

 

I have a list of permanent substrings, and I need to check strings against them to see if the substring is contained in the string.  I cannot modify the strings that will be checked against them.  Each string only has to match with one substring, and there will be no overlap.   

 

i.e. Given the list of substrings dog, quartz, wizard, zebra: the code should return the following output

 

"The quick brown fox jumps over a lazy dog." ->  compares against "Dog" -> return "1"

 

'Sphinx of black quartz, judge my vow." -> compares against "Quartz" -> return "2"

 

"The five boxing wizards jump quickly." -> compares against "Wizard" -> return "3"

 

"How vexingly quick daft zebras jump!" ->  compares against "Zebra" -> return "4"

 

"By Jove, my quick study of lexicography won a prize!" -> fail to find a matching substring -> return "5"

 

 

Right now, I am doing this in what I consider to be a very poor matter:

 

(defun tagmap(input)
  (setq input (strcase input))
  (COND 
    ((/= (vl-string-search "TEST1" input) nil) 
      (princ "1"))
    ((/= (vl-string-search "TEST2" input) nil) 
      (princ "2"))

	[...]

    ((/= (vl-string-search "TESTN" input) nil) 
      (princ "N"))
    (T (princ "String not found"))
  )
)

 

In the future this list could be greatly expanded, so I am looking for a more scalable solution.  Does anyone have a faster method? 

Edited by TemporaryCAD
Clarity
  • TemporaryCAD changed the title to Fastest method to compare list of substrings against strings and return unique output?
Posted

Why do you need to return a number?

Posted
1 minute ago, ronjonp said:

Why do you need to return a number?

 

It doesn't have to be a number, it's intended to be a string.  

 

What is happening is I have a large (>15,000) pallet of blocks that I am generating part numbers from.  I am checking the block name against a descriptor, and then returning the proper prefix to the part number.  The rest of the part number is generated from block attributes.  

 

I.e. given blocks named "#10 machine screw" and "8-32UNC screw" I can handle them by searching for the substring "screw" and returning the prefix "SC" + others.

 

This is just an example but essentially the exact use case.  

Posted (edited)

I'm partial to WCMATCH but TBH not entirely sure of what you're trying to accomplish.

(foreach pat '(("dog" "perfect people") ("quartz" "sparkly") ("wizard" "gandalf") ("zebra" "stripey horse"))
  (foreach str '("The quick brown fox jumps over a lazy dog."
		 "Sphinx of black quartz, judge my vow."
		 "The five boxing wizards jump quickly."
		 "How vexingly quick daft zebras jump!"
		 "By Jove, my quick study of lexicography won a prize!"
		)
    (if	(wcmatch (strcase str) (strcat "*" (strcase (car pat)) "*"))
      (print (cadr pat))
    )
  )
)

 

Edited by ronjonp
Posted (edited)

Since you're looking to expand in the future, it sounds like this will be a regular update to your code. In my workplace, if I had to do something like this, I'd normally create a function at the top of my program so I won't have to scroll down a thousand lines down just to find the list. This is probably what you're looking for in your case:

 

(defun your_list ( / )
    '(
        ("TEST1" . "1")
        ("TEST2" . "2")
        ("TEST3" . "3")
        ("TEST4" . "4")
        ;; .... to the end of eternity
    )
)

(defun tagmap (str)
    (setq str (strcase str))
    (if
        (not
            (vl-some
                (function
                    (lambda (x)
                        (if (vl-string-search (car x) str) (princ (cdr x)))
                    )
                )
                (your_list)
            )
        )
        (princ "\nString not found.")
    )
)

 

As ronjonp suggested, wcmatch is also one way of comparing string, but I'd opt more for vl-string-search for better matching purposes. If I were using wcmatch, I'd have to escape any wildcard characters for robust results, so then I'd have to use LM:escapewildcards.

Edited by Jonathan Handojo
Posted

Not loading visual LISP, so pure vanilla LISP used to help speed.

Does anyone know if this is still relevant? 

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...