Jump to content

Efficient way to isolate duplicates on a large list


Recommended Posts

Posted

Hello everyone,

 

I'm looking for a solution to efficiently extract from a given list all duplicates. I've got a solution that works in theory but that is ineficient in practice due to the length of the list I want to analyse (approximately 25 000 elements).

The list I'm working with is a list of lists. All individual lists decribe a Covadis bloc (topography software) with the format (MAT, ALT, ALTI, (X, Y, Z), ename of the bloc). MAT, ALT and ALTI are attributes value of the bloc and (X, Y, Z) is the insert coordinates. I want to isolate all the blocs that have the same MAT attribute and save them on a separate list.

 

I have come with the solution described in the code below but it's much to slow and absolutely not opimised.

 

; lst = general list with bloc caracteristics
; sublst format (MAT, ALT, ALTI, (X, Y, Z), ename)

(while (setq sublst (car lst))

  (setq lst_same_mat (vl-remove nil (mapcar '(lambda(p) (if (equal (car p) (car sublst)) p nil)) lst)))

  (setq lst_same_mat_save (cons lst_same_mat lst_same_mat_save))

  (setq lst (REMOVE_LST1FROM_LST2 lst_same_mat lst_same_mat_save)
)
;;;;
;lst_gen : list we want to suppress certain element from
;lst_suppr : list of elements to be suppressed in lst_gen
(defun REMOVE_LST1_FROM_LST2 (lst_suppr lst_gen / ele_suppr lst_modif n)

  (setq n 0)
  (setq lst_modif lst_gen)
  (while (setq ele_suppr (nth n lst_suppr))
    (setq lst_modif (vl-remove ele_suppr lst_modif))
    (setq n (1+ n))
  )
  lst_modif
)

 

Would anyone have an idea on how to speed this up for large list ?

 

Thanks and best regards,

 

Jacques

Posted

I'm unsure if you're looking to group the items by the MAT value or something else (sample data and the expected result would help in this regard), but I might suggest manipulating the data into groups using the MAT value as the key, i.e.

(defun foo ( lst / ass rtn )
    (foreach itm lst
        (if (setq ass (assoc (car itm) rtn))
            (setq rtn (subst (vl-list* (car ass) itm (cdr ass)) ass rtn))
            (setq rtn (cons (list (car itm) itm) rtn))
        )
    )
)

 

Posted (edited)

Here's another example - for the below function, "unq" will contain the items with unique MAT value, and "dup" will contain those for which there is more than one MAT value:

(defun foo ( lst / dup key len unq )
    (while (setq itm (car lst))
        (setq key (car itm)
              len (length dup)
              lst (vl-remove-if (function (lambda ( x ) (if (= key (car x)) (setq dup (cons x dup))))) (cdr lst))
        )
        (if (< len (length dup))
            (setq dup (cons itm dup))
            (setq unq (cons itm unq))
        )
    )
    (list unq dup)
)

 

Though it's probably more efficient to use the lists directly in your code rather than the above function returning a list of lists - alternatively, output parameters may be used, e.g.:

(defun foo ( lst unq-out dup-out / dup key len unq )
    (while (setq itm (car lst))
        (setq key (car itm)
              len (length dup)
              lst (vl-remove-if (function (lambda ( x ) (if (= key (car x)) (setq dup (cons x dup))))) (cdr lst))
        )
        (if (< len (length dup))
            (setq dup (cons itm dup))
            (setq unq (cons itm unq))
        )
    )
    (set unq-out unq)
    (set dup-out dup)
    nil
)
_$ (foo '(("abc" 1) ("def" 2) ("abc" 3) ("xyz" 4) ("abc" 5)) 'unq-sym 'dup-sym)
nil
_$ unq-sym
(("xyz" 4) ("def" 2))
_$ dup-sym
(("abc" 1) ("abc" 5) ("abc" 3))

 

Note that the element order is not preserved with the above.

Edited by Lee Mac
Posted

I asked a similar question for a task. It groups common items not sure if it is useful as no data to test on.

; Thanks to Dexus for groupby sep 2025

(defun _groupBy (fun lst0 / itm old rtn)
  (while lst0
    (setq itm (fun (car lst0))
          rtn
            (if (setq old (assoc itm rtn))
              (subst (cons itm (cons (car lst0) (cdr old))) old rtn)
              (cons (cons itm (list (car lst0))) rtn)
            )
          lst0 (cdr lst0)
   )
  )
  (mapcar 'cdr rtn)
)

(setq lst (vl-sort lst '(lambda (x y) (< (car x)(car y)))))
(setq lst2 (_groupBy   (lambda (e) (car e)) lst))

 

Posted

Thank you all for your answers !

 

I was able to do what I wanted to achieve thanks to you. The function that worked for me was the first one submitted by LeeMac. I only had to had the line below to get exactly the list I wanted : 

 

(defun foo ( lst / ass rtn )
    (foreach itm lst
        (if (setq ass (assoc (car itm) rtn))
            (setq rtn (subst (vl-list* (car ass) itm (cdr ass)) ass rtn))
            (setq rtn (cons (list (car itm) itm) rtn))
        )
    )
)

; lst_test = my list with the structure described in my post i.e lst_test = (lst1 lst2 ... lstn) with lstx = (MAT, ALT, COD, ...)

(setq lst_test_sort (foo lst_test))
(setq lst_test_dup (vl-remove nil (mapcar '(lambda(p) (if (> (length p) 2) (cdr p) nil)) lst_test_sort)))

 

I also tried the other function supplied by LeeMac but it was much slower than the first one. I was not able (patient enough) to make it work on a list with approx 25 000 elements.

 

Thanks to BIGAL also for the reply. I must confess that I did not have the time to test the proposed solution. I will have a look at it as soon as I can manage some spare time.

 

I really appreciate the time you took to answer me. Best regards,

 

Jacques

Posted

You might find the following marginally faster -

(defun foo ( lst / ass rtn )
    (foreach itm lst
        (if (setq ass (assoc (car itm) rtn))
            (setq rtn (subst (vl-list* (car ass) itm (cdr ass)) ass rtn))
            (setq rtn (cons (list (car itm) itm) rtn))
        )
    )
    (mapcar 'cdr (vl-remove-if-not 'cddr rtn))
)

 

Posted

Thanks !

 

I will look into that :)

 

Jacques

Posted
On 10/7/2025 at 5:12 PM, jbreard said:

Hello everyone,

 

I'm looking for a solution to efficiently extract from a given list all duplicates. I've got a solution that works in theory but that is ineficient in practice due to the length of the list I want to analyse (approximately 25 000 elements).

The list I'm working with is a list of lists. All individual lists decribe a Covadis bloc (topography software) with the format (MAT, ALT, ALTI, (X, Y, Z), ename of the bloc). MAT, ALT and ALTI are attributes value of the bloc and (X, Y, Z) is the insert coordinates. I want to isolate all the blocs that have the same MAT attribute and save them on a separate list.

 

I have come with the solution described in the code below but it's much to slow and absolutely not opimised.

 

; lst = general list with bloc caracteristics
; sublst format (MAT, ALT, ALTI, (X, Y, Z), ename)

(while (setq sublst (car lst))

  (setq lst_same_mat (vl-remove nil (mapcar '(lambda(p) (if (equal (car p) (car sublst)) p nil)) lst)))

  (setq lst_same_mat_save (cons lst_same_mat lst_same_mat_save))

  (setq lst (REMOVE_LST1FROM_LST2 lst_same_mat lst_same_mat_save)
)
;;;;
;lst_gen : list we want to suppress certain element from
;lst_suppr : list of elements to be suppressed in lst_gen
(defun REMOVE_LST1_FROM_LST2 (lst_suppr lst_gen / ele_suppr lst_modif n)

  (setq n 0)
  (setq lst_modif lst_gen)
  (while (setq ele_suppr (nth n lst_suppr))
    (setq lst_modif (vl-remove ele_suppr lst_modif))
    (setq n (1+ n))
  )
  lst_modif
)

 

Would anyone have an idea on how to speed this up for large list ?

 

Thanks and best regards,

 

Jacques

Hi

Is the goal of fetching the duplicate lists to remove them from the main list?
This is what your code appears to read

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...