+ Reply to Thread
Results 1 to 10 of 10
  1. #1
    Full Member
    Computer Details
    mit's Computer Details
    Operating System:
    MS Windows
    Using
    Map 3D 2015
    Join Date
    Aug 2015
    Location
    Laos
    Posts
    33

    Unhappy How to read UTF-8 Encoding?

    Registered forum members do not see this ad.

    Hello everyone

    I try to make code to read data from .txt file and fill to object data table in Autocad Map, but it was show ????????



    Cloud you please help me?

    test.txt
    Chanthaboury.dwg
    fill_OD_table.lsp
    Attached Images

  2. #2
    Super Member hanhphuc's Avatar
    Using
    AutoCAD 2007
    Join Date
    Apr 2013
    Location
    Happy Garden
    Posts
    749

    Default

    Quote Originally Posted by mit View Post
    Hello everyone

    I try to make code to read data from .txt file and fill to object data table in ]
    not sure but just give a try here, sorry if not a good solution
    Code:
    Can't display "ດິນບຸກຄົນ" ?
    ;;;(setq remark (vk_ReadTextStream "C:/test.txt" "UTF-8"))
    
    ;Try alternative way manually copy text from text file then paste
    (setq remark (getstring  "\nPaste our text here -> "))
    ;"\U+0E94\U+0EB4\U+0E99\U+0E9A\U+0EB8\U+0E81\U+0E84\U+0EBB\U+0E99"
    
    ;or dialog 
    (setq remark (lisped "paste here ") )
    _$ ( apply 'equal "hp" "happy" "hạnh phúc" "ハッピー" "幸福" "행복" )
    ; error: too many arguments

  3. #3
    Full Member
    Computer Details
    mit's Computer Details
    Operating System:
    MS Windows
    Using
    Map 3D 2015
    Join Date
    Aug 2015
    Location
    Laos
    Posts
    33

    Default

    Thank hanhphuc I will try

  4. #4
    Super Member hanhphuc's Avatar
    Using
    AutoCAD 2007
    Join Date
    Apr 2013
    Location
    Happy Garden
    Posts
    749

    Default

    some asian font can be shown normal open function tho it only support ANSI
    if the initial pair is FE FF (hex) or 254 255

    save your test.txt as Unicode

    Code:
    (setq f (open path "r"))
    (setq ret (read-line f)) ;<--test only 1st line
    (if f (close f))
    Code:
    (defun foo ( str ) ; read unicode - test version 
    hanhphuc 17.04.2018 
      (apply 'strcat
    	(mapcar
    	  	''( ( x ) (apply 'strcat (vl-list* (chr 92) "U+" (mapcar ''( (x / $)  (setq $ ( LM:dec->base x 16))
    						  (if (or (< x 10) (=(strlen $)1)) (strcat "0" $) $) )
    					    (reverse x)
    					)
    			   )
    	  	     )
    	     	   )
    		
    		(
    		 '( ( f ) (f (vl-remove-if
    			      '(lambda (x) (vl-some '(lambda (y)
    						       (= x y)
    						       )
    						   '( 254 255 ))
    					      )
    				(vl-string->list str)
    			      )
    			   )
       	   	   )
    	 	 '( ( l ) (if l (cons (list (car l)(cadr l))
    	       			 (f (cddr l)))
    			   )
    	  	  )
    		)	
    	)
      )
    )
    
    ;; Decimal to Base  -  Lee Mac
    ;; Converts a decimal number to another base.
    ;; n - [int] decimal integer
    ;; b - [int] non-zero positive integer base
    ;; Returns: [str] Representation of decimal in specified base
    
    (defun LM:dec->base ( n b )
        (if (< n b)
            (chr (+ n (if (< n 10) 48 55)))
            (strcat (LM:dec->base (/ n b) b) (LM:dec->base (rem n b) b))
        )
    )
    test..
    Code:
    (alert (foo ret ) ) 
    ດິນບຸກຄົນ ??
    
    "\U+0E94\U+0EB4\U+0E99\U+0E9A\U+0EB8\U+0E81\U+0E84\U+0EBB\U+0E99" 
    Try if the above if working for your language?
    else plan B: assumed you FSO read stream with UTF-8 file it is more stable but difficult to pair 1~4 bytes, if i have some times
    _$ ( apply 'equal "hp" "happy" "hạnh phúc" "ハッピー" "幸福" "행복" )
    ; error: too many arguments

  5. #5
    Full Member
    Computer Details
    mit's Computer Details
    Operating System:
    MS Windows
    Using
    Map 3D 2015
    Join Date
    Aug 2015
    Location
    Laos
    Posts
    33

    Default

    Great!!!
    It can work

    Thank you very much hanhphuc
    and Thank Lee Mac Code

  6. #6
    Super Member hanhphuc's Avatar
    Using
    AutoCAD 2007
    Join Date
    Apr 2013
    Location
    Happy Garden
    Posts
    749

    Default

    Quote Originally Posted by mit View Post
    Great!!!
    It can work

    Thank you very much hanhphuc
    and Thank Lee Mac Code
    you are welcome. hope you will code by yourself next time

    Here's my UTF-8 functions may be useful in future if you have issue with the previous unicode method. Try it & good luck..
    Code:
    ;Reference, post#138 
    ;https://stackoverflow.com/questions/643694/what-is-the-difference-between-utf-8-and-unicode
    
    (defun UTF8->unicode ( l / ls 8b d2 foo) ; encode UTF-8 to unicode
    ;;;hanhphuc 17.04.2018 
      (setq	8b '((s) (while (< (strlen s) 8) (setq s (strcat "0" s))) s) 
     	d2 '((str) ;split string to two list  
     		 (if (> (strlen str) 0)
       		 (cons (substr str 1 8) (d2 (setq str (substr str 9 ))))
        		)
     	     )
    	foo '(($ / pos i) ; base2 to decimal 
      		(setq i 0)
      		(+ (cond ((while (and (> (strlen $) 0) (setq pos (vl-string-search "1" $)))
    	     	 		(setq 	$ (substr $ (+ 2 pos))
    		   			i (+ i (expt 2 (strlen $)))
    		   		 )
    	      		     )
    	    		  )
    	   		(0)
    	   	      )
         		   (atoi $)
         	     	  )
    	     	)
      	ls (mapcar ''((x / $) 
    		      (setq $ (LM:dec->base (foo x) 16))
    		      (if
    		       (= (strlen $) 1)
    		       (strcat "0" $)
    		       $
    		       )
    		      )
    		   (d2 
    		     (apply 'strcat
    			    (mapcar ''((a x) (substr (8b a) (- 9 x) x))
    				    l
    				    (cdr (assoc (length l) '((1 . (7)) (2 . (5 6)) (3 . (4 6 6)) (4 . (3 6 6 6)))))
    				    )
    			    )
    		     ) 
    		   ) 
    	)
      (apply 'strcat
    	 (vl-list* "\\U"
    		  (if (> (length ls) 1)
    		 "+"
    		 "+00")
    	       ls
    	       )
    	 )
      )
    
    
    (defun U8:bytes (l / x ls)
      ;hanhphuc 17.04.2018
      ;UTF-8 split the bytes 
      (setq x (car l))
      (if l
        (cons (vl-remove nil (cond	((<= 0 x 191)
    		 (setq ls (list x)
    		       l  (cdr l)
    		       )
    		 ls
    		 )
    		((<= 192 x 223)
    		 (setq ls (list x (cadr l))
    		       l  (cddr l)
    		       )
    		 ls
    		 )
    		((<= 224 x 239)
    		 (setq ls (list x (cadr l) (caddr l))
    		       l  (cdddr l)
    		       )
    		 ls
    		 )
    		((<= 240 x 247)
    		 (setq ls (list x (cadr l) (caddr l) (cadddr l))
    		       l  (cddddr l)
    		       )
    		 ls
    		 )
    		)
    	    )
    	  (U8:bytes l)
    	  )
        ) 
      )
    Here's the workaround
    Step 1: read file
    ;assume this is the read result from stream UTF-8 file contents

    Code:
    (setq ret "Lee Mac & Marko Ribar\r\nHappy Birthday\r\n祝ä½*们生日快乐\r\n幸福\r\nChúc mừng sinh nháº*t\r\n"
          )

    step 2: convert to char list
    Code:
    (setq lst (vl-string->list ret))
    
    ;Decimal 
    '(239 187  191  76   101  101  32   77	97   99	  32   38   32	 77   97   114	107  111  32   82   105	 98   97   114
         13	  10   72   97	 112  112  121	32   66	  105  114  116	 104  100  97	121  13	  10   231  165	 157  228  189
         160  228  187  172	 231  148  159	230  151  165  229  191	 171  228  185	144  13	  10   229  185	 184  231  166
         143  13   10   67	 104  195  186	99   32	  109  225  187	 171  110  103	32   115  105  110  104	 32   110  104
         225  186  173  116	 13   10
         )
    
    ;Hex 
    '("EF" "BB"   "BF"   "4C"   "65"  "65"	 "20"	"4D"   "61"   "63"   "20"   "26"   "20"	  "4D"	 "61"	"72"   "6B"
          "6F"   "20"   "52"   "69"	  "62"	 "61"	"72"   "D"    "A"    "48"   "61"   "70"	  "70"	 "79"	"20"   "42"
          "69"   "72"   "74"   "68"	  "64"	 "61"	"79"   "D"    "A"    "E7"   "A5"   "9D"	  "E4"	 "BD"	"A0"   "E4"
          "BB"   "AC"   "E7"   "94"	  "9F"	 "E6"	"97"   "A5"   "E5"   "BF"   "AB"   "E4"	  "B9"	 "90"	"D"    "A"
          "E5"   "B9"   "B8"   "E7"	  "A6"	 "8F"	"D"    "A"    "43"   "68"   "C3"   "BA"	  "63"	 "20"	"6D"   "E1"
          "BB"   "AB"   "6E"   "67"	  "20"	 "73"	"69"   "6E"   "68"   "20"   "6E"   "68"	  "E1"	 "BA"	"AD"   "74"
          "D"    "A"
          )

    ;Step 3: (U8:bytes lst ) function to filter the bytes list
    Code:
    '((239 187 191) (76) (101) (101) (32) (77) (97) (99) (32) (38) (32) (77) (97) (114) (107) (111) (32) (82) (105) (98) (97) (114)
      (13) (10) (72) (97) (112) (112) (121) (32) (66) (105) (114) (116) (104) (100) (97) (121) (13) (10) (231 165 157) (228 189 160)
      (228 187 172) (231 148 159) (230 151 165) (229 191 171) (228 185 144) (13) (10) (229 185 184) (231 166 143) (13) (10) (67) (104)
      (195 186) (99) (32) (109) (225 187 171) (110) (103) (32) (115) (105) (110) (104) (32) (110) (104) (225 186 173) (116) (13) (10))

    ;Step 4: convert decimal to base 2, then apply the function UTF8->unicode to encode

    example: 汉
    Code:
    (mapcar ''(( x ) (LM:dec->base x 2) )'(230 177 137)) ;Hex= E6 B1 89 
    (alert
    (UTF8->unicode '( "11100110""10110001""10001001" ) )
    )
    "\U+6C49"
    
    ;you can encode each in the byte list, function  car, last , nth
    (UTF8->unicode (nth 10 lst) )
    Finally concatenate all the encoded bytes list:

    some screen shots


    p/s: randomly tested Arabian, Chinese, Hindi, Japanese, Korean, Lao ,Punjabi, Russian, Tamil, Vietnamese etc.. still has some issues
    Last edited by hanhphuc; 18th Apr 2018 at 03:40 pm. Reason: link added, 汉, date & syntax color
    _$ ( apply 'equal "hp" "happy" "hạnh phúc" "ハッピー" "幸福" "행복" )
    ; error: too many arguments

  7. #7
    Full Member
    Computer Details
    mit's Computer Details
    Operating System:
    MS Windows
    Using
    Map 3D 2015
    Join Date
    Aug 2015
    Location
    Laos
    Posts
    33

    Unhappy What's wrong to this code?

    Hello
    Cloud you please help me?
    what's wrong to this code?

    Code:
    (defun c:test ()
    ;Step 1: read file
    (setq ret "Lee Mac & Marko Ribar\r\nHappy Birthday\r\n祝ä½*们生日快乐\r\n幸福\r\nChúc mừng sinh nháº*t\r\n"
          )
      
    ;step 2: convert to char list
    (setq lst (vl-string->list ret))
      
    ;Step 3: (U8:bytes lst ) function to filter the bytes list 
    (setq lstt (U8:bytes lst))
    
      
    ;Step 4: convert decimal to base 2, then apply the function UTF8->unicode to encode
    (foreach txt lstt (mapcar ''(( x ) (LM:dec->base x 2) ) 'txt))
    (princ (UTF8->unicode 'txt ))
    
    )
    
    ;Reference, post#138 
    ;https://stackoverflow.com/questions/643694/what-is-the-difference-between-utf-8-and-unicode
    
    (defun UTF8->unicode ( l / ls 8b d2 foo) ; encode UTF-8 to unicode
    ;;;hanhphuc 17.04.2018 
      (setq	8b '((s) (while (< (strlen s) 8) (setq s (strcat "0" s))) s) 
     	d2 '((str) ;split string to two list  
     		 (if (> (strlen str) 0)
       		 (cons (substr str 1 8) (d2 (setq str (substr str 9 ))))
        		)
     	     )
    	foo '(($ / pos i) ; base2 to decimal 
      		(setq i 0)
      		(+ (cond ((while (and (> (strlen $) 0) (setq pos (vl-string-search "1" $)))
    	     	 		(setq 	$ (substr $ (+ 2 pos))
    		   			i (+ i (expt 2 (strlen $)))
    		   		 )
    	      		     )
    	    		  )
    	   		(0)
    	   	      )
         		   (atoi $)
         	     	  )
    	     	)
      	ls (mapcar ''((x / $) 
    		      (setq $ (LM:dec->base (foo x) 16))
    		      (if
    		       (= (strlen $) 1)
    		       (strcat "0" $)
    		       $
    		       )
    		      )
    		   (d2 
    		     (apply 'strcat
    			    (mapcar ''((a x) (substr (8b a) (- 9 x) x))
    				    l
    				    (cdr (assoc (length l) '((1 . (7)) (2 . (5 6)) (3 . (4 6 6)) (4 . (3 6 6 6)))))
    				    )
    			    )
    		     ) 
    		   ) 
    	)
      (apply 'strcat
    	 (vl-list* "\\U"
    		  (if (> (length ls) 1)
    		 "+"
    		 "+00")
    	       ls
    	       )
    	 )
      )
    
    
    (defun U8:bytes (l / x ls)
      ;hanhphuc 17.04.2018
      ;UTF-8 split the bytes 
      (setq x (car l))
      (if l
        (cons (vl-remove nil (cond	((<= 0 x 191)
    		 (setq ls (list x)
    		       l  (cdr l)
    		       )
    		 ls
    		 )
    		((<= 192 x 223)
    		 (setq ls (list x (cadr l))
    		       l  (cddr l)
    		       )
    		 ls
    		 )
    		((<= 224 x 239)
    		 (setq ls (list x (cadr l) (caddr l))
    		       l  (cdddr l)
    		       )
    		 ls
    		 )
    		((<= 240 x 247)
    		 (setq ls (list x (cadr l) (caddr l) (cadddr l))
    		       l  (cddddr l)
    		       )
    		 ls
    		 )
    		)
    	    )
    	  (U8:bytes l)
    	  )
        ) 
      )
    
    (defun foo ( str ) ; read unicode - test version 
    hanhphuc 17.04.2018 
      (apply 'strcat
    	(mapcar
    	  	''( ( x ) (apply 'strcat (vl-list* (chr 92) "U+" (mapcar ''( (x / $)  (setq $ ( LM:dec->base x 16))
    						  (if (or (< x 10) (=(strlen $)1)) (strcat "0" $) $) )
    					    (reverse x)
    					)
    			   )
    	  	     )
    	     	   )
    		
    		(
    		 '( ( f ) (f (vl-remove-if
    			      '(lambda (x) (vl-some '(lambda (y)
    						       (= x y)
    						       )
    						   '( 254 255 ))
    					      )
    				(vl-string->list str)
    			      )
    			   )
       	   	   )
    	 	 '( ( l ) (if l (cons (list (car l)(cadr l))
    	       			 (f (cddr l)))
    			   )
    	  	  )
    		)	
    	)
      )
    )
    
    ;; Decimal to Base  -  Lee Mac
    ;; Converts a decimal number to another base.
    ;; n - [int] decimal integer
    ;; b - [int] non-zero positive integer base
    ;; Returns: [str] Representation of decimal in specified base
    
    (defun LM:dec->base ( n b )
        (if (< n b)
            (chr (+ n (if (< n 10) 48 55)))
            (strcat (LM:dec->base (/ n b) b) (LM:dec->base (rem n b) b))
        )
    )

  8. #8
    Super Member hanhphuc's Avatar
    Using
    AutoCAD 2007
    Join Date
    Apr 2013
    Location
    Happy Garden
    Posts
    749

    Default

    Quote Originally Posted by mit View Post
    Hello
    Cloud you please help me?
    what's wrong to this code?

    Code:
    (defun c:test ()
    ;Step 1: read file
    (setq ret "Lee Mac & Marko Ribar\r\nHappy Birthday\r\n祝ä½*们生日快乐\r\n幸福\r\nChúc mừng sinh nháº*t\r\n"
          )
      
    ;step 2: convert to char list
    (setq lst (vl-string->list ret))
      
    ;Step 3: (U8:bytes lst ) function to filter the bytes list 
    (setq lstt (U8:bytes lst))
    
      
    ;Step 4: convert decimal to base 2, then apply the function UTF8->unicode to encode
    (foreach txt lstt (mapcar ''(( x ) (LM:dec->base x 2) ) 'txt))
    (princ (UTF8->unicode 'txt ))
    
    )
    good afford no worries learning from mistake, 'txt = (quote txt)
    QUOTE function returns without evaluating argument

    Code:
    (setq txt "HELLO")
    ( princ 'txt ) 
    TXT
    ( princ txt )
     "HELLO"
    so change to this
    Code:
    ;Step 4: convert decimal to base 2, then apply the function UTF8->unicode to encode
    (foreach txt lstt
    (princ (UTF8->unicode (mapcar ''(( x ) (LM:dec->base x 2) ) txt )) )
    )
    (princ)
    be happy coding

    p/s: use read-char for reading unicode file
    _$ ( apply 'equal "hp" "happy" "hạnh phúc" "ハッピー" "幸福" "행복" )
    ; error: too many arguments

  9. #9
    Full Member
    Computer Details
    mit's Computer Details
    Operating System:
    MS Windows
    Using
    Map 3D 2015
    Join Date
    Aug 2015
    Location
    Laos
    Posts
    33

    Default

    Thank you very mush hanhphuc

  10. #10
    Super Member hanhphuc's Avatar
    Using
    AutoCAD 2007
    Join Date
    Apr 2013
    Location
    Happy Garden
    Posts
    749

    Default

    Registered forum members do not see this ad.

    Quote Originally Posted by mit View Post
    Thank you very mush hanhphuc
    you are welcome

    p/s: sorry it was typo #138 in reference link, it should be post#147
    _$ ( apply 'equal "hp" "happy" "hạnh phúc" "ハッピー" "幸福" "행복" )
    ; error: too many arguments

Similar Threads

  1. Generate file with encoding UTF-8.
    By robierzo in forum AutoLISP, Visual LISP & DCL
    Replies: 15
    Last Post: 27th Jan 2016, 03:14 pm
  2. Replies: 2
    Last Post: 7th Feb 2014, 09:11 am
  3. Can a LISP routine text file be saved with the UNICODE encoding (from Notepad)?
    By lamensterms in forum AutoLISP, Visual LISP & DCL
    Replies: 12
    Last Post: 13th Jun 2013, 03:52 pm
  4. Suppress the read-only notice when opening read-only files?
    By DrewS in forum AutoCAD Drawing Management & Output
    Replies: 10
    Last Post: 14th Jan 2009, 10:31 am
  5. Is it possible to make a read-only sysvar a read/write var?
    By whatispunk in forum AutoCAD General
    Replies: 1
    Last Post: 11th Mar 2006, 11:49 am

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts