Jump to content

Extract common prefix and suffix from a list of strings


Jonathan Handojo

Recommended Posts

Hi fellow coders,

 

It seems like it's simple to achieve, but I feel like any approach I'm taking is an inefficient method, so I thought I'd bring it up here.

 

Does anyone have a function that can analyse a list of strings and take out common prefix and suffix?

I want the function to return a list of three items:

 

First item is a string denoting the prefix

Second item is a list of the remaining strings that are uncommon

Third item is a string denoting the suffix.

 

 

For example, if the argument list ("Level 1 Floor Plan" "Level 2 Floor Plan" "Level 2b Floor Plan" "Level 3 Floor Plan") is supplied, the function returns:
("Level " ("1" "2" "2b" "3") " Floor Plan")

 

It'll be nice if the function can work cast-insensitive and I can then titlecase it.

If the list supplied only has one string, then just return the string itself (I can do this bit).

 

Thanks,

Jonathan Handojo

Link to comment
Share on other sites

(defun foo ( strlst / findpre itm pre suf )

  (defun findpre ( strlst k / str )
    (if (null itm)
      (setq itm (car strlst))
    )
    (setq str (substr itm 1 k))
    (if (vl-every '(lambda ( x ) (wcmatch x (strcat str "*"))) strlst)
      (findpre strlst (1+ k))
      (substr str 1 (1- (strlen str)))
    )
  )

  (setq pre (findpre strlst 1))
  (setq itm nil)
  (setq strlst (mapcar 'vl-list->string (mapcar 'reverse (mapcar 'vl-string->list strlst))))
  (setq suf (vl-list->string (reverse (vl-string->list (findpre strlst 1)))))
  (setq strlst (mapcar 'vl-list->string (mapcar 'reverse (mapcar 'vl-string->list strlst))))
  (setq strlst (mapcar '(lambda ( x ) (vl-string-trim pre (vl-string-trim suf x))) strlst))
  (list pre strlst suf)
)

;;; (foo '("Level 1 Floor Plan" "Level 2 Floor Plan" "Level 2b Floor Plan" "Level 3 Floor Plan"))
;;; ("Level " ("1" "2" "2b" "3") " Floor Plan")

HTH. M.R.

  • Like 1
Link to comment
Share on other sites

Hey, thanks for that Marko. I'll tweak it some few bits so that it won't crash if:

 

  • All items are identical
  • There's only one item in the list
Link to comment
Share on other sites

How about:

(defun foo ( l / a b )
    (setq a (bar l)
          b (bar (mapcar '(lambda ( x ) (vl-list->string (reverse (vl-string->list x)))) l))
    )
    (list (substr (car l) 1 a)
          (mapcar '(lambda ( x ) (substr x (1+ a) (- (strlen x) a b))) l)
          (substr (car l) (- (strlen (car l)) b -1))
    )
)
(defun bar ( l )
    (apply 'min (mapcar '(lambda ( s ) (vl-string-mismatch (car l) s)) (cdr l)))
)
_$ (foo '("Level 1 Floor Plan" "Level 2 Floor Plan" "Level 2b Floor Plan" "Level 3 Floor Plan"))
("Level " ("1" "2" "2b" "3") " Floor Plan")

If you want case-insensitivity, simply include the ignore-case argument for the vl-string-mismatch function.

Edited by Lee Mac
  • Like 2
Link to comment
Share on other sites

Yes, Lee, yours is better... (I am not sure if (vl-string-trim) would trim more characters than it should...

If I may revise my version :

(defun foo ( strlst / findpre itm pre suf a b )

  (defun findpre ( strlst k / str )
    (if (null itm)
      (setq itm (car strlst))
    )
    (setq str (substr itm 1 k))
    (if (vl-every '(lambda ( x ) (wcmatch x (strcat str "*"))) strlst)
      (findpre strlst (1+ k))
      (substr str 1 (1- (strlen str)))
    )
  )

  (setq pre (findpre strlst 1))
  (setq itm nil)
  (setq strlst (mapcar 'vl-list->string (mapcar 'reverse (mapcar 'vl-string->list strlst))))
  (setq suf (vl-list->string (reverse (vl-string->list (findpre strlst 1)))))
  (setq a (strlen pre))
  (setq b (strlen suf))
  (setq strlst (mapcar 'vl-list->string (mapcar 'reverse (mapcar 'vl-string->list strlst))))
  (setq strlst (mapcar '(lambda ( x ) (substr x (1+ a) (- (strlen x) a b))) strlst)) ;;; This line is better (Thanks Lee)
  (list pre strlst suf)
)

;;; (foo '("Level 1 Floor Plan" "Level 2 Floor Plan" "Level 2b Floor Plan" "Level 3 Floor Plan"))
;;; ("Level " ("1" "2" "2b" "3") " Floor Plan")

 

Edited by marko_ribar
  • Like 1
Link to comment
Share on other sites

Ha, if I had known about vl-string-mismatch, I wouldn't have posted in this forum. Good to learn new functions everyday. Thanks Lee.

Link to comment
Share on other sites

One question for you Lee (and all other professional coders): In your function, you declare the function 'bar' outside the main function. Is there any reason for doing this as opposed to declaring the function 'bar' local within the function? It's not like the function 'bar' will be used in many cases. (If it's widely used, then it makes sense and I'll name the function that will less likely conflict with other existing functions.)

Link to comment
Share on other sites

18 minutes ago, Jonathan Handojo said:

One question for you Lee (and all other professional coders): In your function, you declare the function 'bar' outside the main function. Is there any reason for doing this as opposed to declaring the function 'bar' local within the function? It's not like the function 'bar' will be used in many cases. (If it's widely used, then it makes sense and I'll name the function that will less likely conflict with other existing functions.)

 

Providing that the functions are appropriately named when used in an application, defining the supporting function outside of the function in which it is called is more efficient, as if the supporting function is defined locally within the calling function, the supporting function is needlessly redefined for every evaluation of the calling function.

Edited by Lee Mac
  • Like 1
Link to comment
Share on other sites

And a quick benchmark :)

(setq l nil)
(repeat	88
  (setq	l
	 (cons '("Level 1 Floor Plan" "Level 2 Floor Plan" "Level 2b Floor Plan" "Level 3 Floor Plan")
	       l
	 )
  )
)
(setq l (apply 'append l))
(benchmark '((foo l) (foolocal l)))
 
;;;Benchmarking .............Elapsed milliseconds / relative speed for 1024 iteration(s):
;;;    (FOO L)..........1625 / 1.51 <fastest>
;;;    (FOOLOCAL L).....2453 / 1.00 <slowest>
;;;
;;; 
;;;; 4 forms loaded from #<editor "<Untitled-0> loading...">
;;;_$ 

 

Link to comment
Share on other sites

Heres another one -

(defun CommonPrefSuf ( sL / sLx pfx sfx )
  (setq sLx (mapcar (function vl-string->list) sL))
  (setq pfx "")
  (vl-every (function (lambda (x) (if (apply '= x) (setq pfx (strcat pfx (chr (car x)))))))
    (apply 'mapcar (cons 'list sLx))
  )
  (setq sfx "")
  (vl-every (function (lambda (x) (if (apply '= x) (setq sfx (strcat (chr (car x)) sfx)))))
    (apply 'mapcar (cons 'list (mapcar 'reverse sLx)))
  ) 
  (list pfx (mapcar (function (lambda (s) (vl-string-right-trim sfx (vl-string-left-trim pfx s)))) sL) sfx)
); defun 
_$ (CommonPrefSuf '("Level 1 Floor Plan" "Level 2 Floor Plan" "Level 2b Floor Plan" "Level 3 Floor Plan"))
("Level " ("1" "2" "2b" "3") " Floor Plan")

 

Link to comment
Share on other sites

On 11/12/2020 at 7:55 AM, Lee Mac said:

Providing that the functions are appropriately named when used in an application, defining the supporting function outside of the function in which it is called is more efficient, as if the supporting function is defined locally within the calling function, the supporting function is needlessly redefined for every evaluation of the calling function.

 

I suppose that makes sense. I've created hundreds of LISP routines and functions just like you have, and I too tend to do just that on functions that I use widely. (the ones that I mark with my initials) As I code more and more, I then seem to stumble as to how I can make a revision to a function that carries the same name across all files. For instance, when one realises a mistake and makes a revision on a function that happens to exist on multiple other LISP files, there's a good chance that you'll miss revising on one of the other files. The more routines you have, the more chances you'll have of missing it. And if you save the function on a separate LISP file by itself where all others will have to call from that file, sure that works fine but all your programs will fail if that file gets corrupted or missing. I suppose for that reason, I tend to avoid putting the supporting function outside of the main function because you'll only have more global variables that you need to take care of.

 

In my workplace, as opposed to having to put every single LISP code on the startup, I used your LM:findfile function to load every single LISP routines found under that directory and put that single one on the startup. 

Link to comment
Share on other sites

On 11/13/2020 at 12:41 AM, Lee Mac said:

Good idea @Grrr, but be careful with the use of the vl-string-*-trim functions...


_$ (CommonPrefSuf '("abc" "aabcc"))
("a" ("b" "b") "c")

 

 

Thanks for pointing that out, Lee! I mean I'm kinda aware of this problem that vl-string-*-trim functions have, because this is like the 3rd remark I get from you about this,

but I think its a useful pointer for Jonathan.

I feel that my double vl-every suggestion could be compacted into one iteration with the use of this inversed matrix technique (apply 'mapcar (cons 'list sLx)),

but don't have the brainpower to figure it out.

 


 

11 hours ago, Jonathan Handojo said:

I tend to avoid putting the supporting function outside of the main function because you'll only have more global variables that you need to take care of

 

May I involve with my coding style politics? :

It relies on scoping subfunctions and depending on the case giving proper comment descriptions - describing the algorithm or just the expected inputs and the output or both.

Depending on the subfunction's size and its use in other subfunctions I may comment it or double declare it and localise it in other subfunctions.

For example I would always localise and perform multiple declarations and localisations for subfoos like this, (affording the lack of description and properly naming):

(defun stringp ( s ) (eq 'STR (type s)) )
(defun fill_list_box ( k L ) (start_list k) (mapcar 'add_list L) (end_list) L)

I would always declare subfunctions like this outside of my main (I call them "main subfunctions"), and properly describe them  :

; Description of BigComplexSubfoo1:
; • what the algorithm does [optional if it relies on the return]
; • what args are expected
; • what it returns [optional if it relies on the algorithm]
(defun BigComplexSubfoo1 (args / stringp fill_list_box)
  
  (defun stringp ( s ) (eq 'STR (type s)) )
  (defun fill_list_box ( k L ) (start_list k) (mapcar 'add_list L) (end_list) L)
  
); defun BigComplexSubfoo1


; Description of BigComplexSubfoo2:
; • what the algorithm does [optional if it relies on the return]
; • what args are expected
; • what it returns [optional if it relies on the algorithm]
(defun BigComplexSubfoo2 (args / stringp fill_list_box)
  
  (defun stringp ( s ) (eq 'STR (type s)) )
  (defun fill_list_box ( k L ) (start_list k) (mapcar 'add_list L) (end_list) L)
  
); defun BigComplexSubfoo2

Within the main subfunctions body may be going some complex recursive algorithm calling functions with names like 'f1', 'f2' and 'f3', but as long it is encapsulated and the whole thing works properly (bug-free) I wouldn't bother on checking this subfunction's body.

 

The main function obviously would use the calls of the other main subfunctions, would have its own algorithm with its own localised subfunctions, so for me It would look like:

; Description of Main 
; • what the algorithm does
; • what BigComplexSubfoo-s it requires
(defun C:Main ( / stringp fill_list_box )

  (defun stringp ( s ) (eq 'STR (type s)) )
  (defun fill_list_box ( k L ) (start_list k) (mapcar 'add_list L) (end_list) L)
  
  (cond 
    ( (not (or BigComplexSubfoo1 BigComplexSubfoo2)) )
    (t 
      ; ... 
    ); t 
  ); cond 
); defun 

But saying it again: thats my own coding style, which is not the most efficient, whereas the priority is spaghetti-free (no external dependencies) -

everything is scoped as much as possible: within the *.lsp file, within the main subfunctions or the main call function.

Advantages are that you can easily copy-paste single main subfunctions to reuse in a new project

and easily trace which function is dependent on which, and no function-body investigations.

Disatvantages are that your programs will work significantly slower, because of the redundant subfoo declarations

(but I don't see it as a big problem as long you have in mind the approximate amount of iterations).

 

Alternative styles I've seen are extract-out as much subfunctions as possible and prefixing their names such as in Lee's block counter program,

or in Michael Puckett's style of encapsulating everything within the main subfunction, and calling it in the end:

(defun _Main ( l / foo bar )
  ; Description of _Main ...
  
  (defun foo ( l / a b )
    ; ...
  )
  (defun bar ( l )
    ; ...
  )
  
  (_Main l)
); defun 

 

Edited by Grrr
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...