Efficient way to isolate duplicates on a large list

2025-10-07T15:12:33Z

Hello everyone,

I'm looking for a solution to efficiently extract from a given list all duplicates. I've got a solution that works in theory but that is ineficient in practice due to the length of the list I want to analyse (approximately 25 000 elements).

The list I'm working with is a list of lists. All individual lists decribe a Covadis bloc (topography software) with the format (MAT, ALT, ALTI, (X, Y, Z), ename of the bloc). MAT, ALT and ALTI are attributes value of the bloc and (X, Y, Z) is the insert coordinates. I want to isolate all the blocs that have the same MAT attribute and save them on a separate list.

I have come with the solution described in the code below but it's much to slow and absolutely not opimised.

; lst = general list with bloc caracteristics
; sublst format (MAT, ALT, ALTI, (X, Y, Z), ename)

(while (setq sublst (car lst))

  (setq lst_same_mat (vl-remove nil (mapcar '(lambda(p) (if (equal (car p) (car sublst)) p nil)) lst)))

  (setq lst_same_mat_save (cons lst_same_mat lst_same_mat_save))

  (setq lst (REMOVE_LST1FROM_LST2 lst_same_mat lst_same_mat_save)
)
;;;;
;lst_gen : list we want to suppress certain element from
;lst_suppr : list of elements to be suppressed in lst_gen
(defun REMOVE_LST1_FROM_LST2 (lst_suppr lst_gen / ele_suppr lst_modif n)

  (setq n 0)
  (setq lst_modif lst_gen)
  (while (setq ele_suppr (nth n lst_suppr))
    (setq lst_modif (vl-remove ele_suppr lst_modif))
    (setq n (1+ n))
  )
  lst_modif
)

Would anyone have an idea on how to speed this up for large list ?

Thanks and best regards,

Jacques

2025-10-07T15:27:19Z

Try Lee Mac's "List Duplicates" functions at the following link:

https://www.lee-mac.com/uniqueduplicate.html#listdupes

2025-10-07T15:36:52Z

I'm unsure if you're looking to group the items by the MAT value or something else (sample data and the expected result would help in this regard), but I might suggest manipulating the data into groups using the MAT value as the key, i.e.

(defun foo ( lst / ass rtn )
    (foreach itm lst
        (if (setq ass (assoc (car itm) rtn))
            (setq rtn (subst (vl-list* (car ass) itm (cdr ass)) ass rtn))
            (setq rtn (cons (list (car itm) itm) rtn))
        )
    )
)

2025-10-07T16:10:04Z

Here's another example - for the below function, "unq" will contain the items with unique MAT value, and "dup" will contain those for which there is more than one MAT value:

(defun foo ( lst / dup key len unq )
    (while (setq itm (car lst))
        (setq key (car itm)
              len (length dup)
              lst (vl-remove-if (function (lambda ( x ) (if (= key (car x)) (setq dup (cons x dup))))) (cdr lst))
        )
        (if (< len (length dup))
            (setq dup (cons itm dup))
            (setq unq (cons itm unq))
        )
    )
    (list unq dup)
)

Though it's probably more efficient to use the lists directly in your code rather than the above function returning a list of lists - alternatively, output parameters may be used, e.g.:

(defun foo ( lst unq-out dup-out / dup key len unq )
    (while (setq itm (car lst))
        (setq key (car itm)
              len (length dup)
              lst (vl-remove-if (function (lambda ( x ) (if (= key (car x)) (setq dup (cons x dup))))) (cdr lst))
        )
        (if (< len (length dup))
            (setq dup (cons itm dup))
            (setq unq (cons itm unq))
        )
    )
    (set unq-out unq)
    (set dup-out dup)
    nil
)

_$ (foo '(("abc" 1) ("def" 2) ("abc" 3) ("xyz" 4) ("abc" 5)) 'unq-sym 'dup-sym)
nil
_$ unq-sym
(("xyz" 4) ("def" 2))
_$ dup-sym
(("abc" 1) ("abc" 5) ("abc" 3))

Note that the element order is not preserved with the above.

Edited 1 hour ago by Lee Mac

Sign In

Efficient way to isolate duplicates on a large list

Recommended Posts

jbreard

pkenewell

Lee Mac

Lee Mac

Join the conversation

Tutorials

Forums

Activity