Jump to content

Can a LISP routine text file be saved with the UNICODE encoding (from Notepad)?


lamensterms

Recommended Posts

Hey guys,

 

I am just in the process of creating my own custom header for all my LISP routines, and some of the characters/symbols I have chosen are dependent on UNICODE encoding.

 

I'm not at all familiar with what the difference is between ANSI and UNICODE, so I was just wondering if you guys could please school me on whether or not UNICODE can be used for LISP routines - if the characters are not actually part of the routine?

 

Thanks for any help.

Link to comment
Share on other sites

Unicode is group of the ways text is saved on computers. The entire principle is known as Character Encoding. Unicode at least uses 16 bits per character. ANSI is similar to the old DOS ASCII code, both of which only use 8 bits per character. The issue is that ANSI/ASCII only has around 230 usable characters - there are 256 possibilities, but some are used as control codes. So you find lots of special characters not available.

 

Unicode starts off with double-byte characters. This is referred to as Unicode 16, because it uses 16 bits and allows for 65536 distinct characters. You also get Unicode 32 and so. Also there are 2 major variants called little-endian and big-endian, the difference is the direction of bits in each of those characters. With little endian (default on windows) you get it saved in largest position to smallest - i.e. how you write numbers down (e.g. instead of one hundred twenty four written in little endian as 124 you'd write it 421 in big endian).

 

Unfortunately ACad doesn't read Unicode at all. But there's a 2nd idea called UTF8. This is a varying bit-length encoding. The 1st 127 characters are the same as in ANSI/ASCII, but if the 8bit code is greater than that it means this character has another byte of 8bits. Usually these files can at least be loaded into acad, but with strange characters I've still seen them give trouble.

 

If these special characters are simply in your header, I'd advise saving as UTF8 and keeping those characters in a comment. That way acad should be able to load the file properly without issues.

Link to comment
Share on other sites

As an example, I made a UTF8 file containing only this line:

(princ "Unicode worked ℅ℓ№℗™Ω℮←↑→↓↔↕↨")

Then saved it using Unicode and then also ANSI.

 

If I load the ANSI I get:

Unicode worked ?l??™?e???????"Unicode worked ?l??™?e???????"

With the Unicode version (both little and big endians):

*Cancel*
bad character read (octal): 0

And with UTF8:

Unicode worked ℅ℓ№℗™Ω℮←↑→↓↔↕↨"Unicode worked ℅ℓ№℗™Ω℮←↑→↓↔↕↨"

Link to comment
Share on other sites

Hi irneb, thanks so much for taking the time to reply and explain the encoding.

 

I'll try creating the header using a variety of UNICODE characters, and save it to UTF8 and see which ones I can sneak through.

 

Thanks again, I'll post back with my results.

Link to comment
Share on other sites

Hi again irneb,

 

Your suggestion worked great. By using the UTF8 encoding I was able to include all the UNICODE characters I needed to create a header.

 

;┌─────────────────────────┐;
;│                         │;
;├─────────────────────────┤;
;│                         │;
;├────────────┬────────────┤;
;│            │            │;
;│            │            │;
;├────────────┼────────────┤;
;│            │            │;
;├────────────┴────────────┤;
;└─────────────────────────┘;

 

Thanks again for your help.

 

Marcus

Link to comment
Share on other sites

Great! Yes those borders are an issue. They're some of the stuff where ANSI differs from the older ASCII codes - mostly because when ASCII was used (i.e. DOS days - http://www.theasciicode.com.ar/extended-ascii-code/box-drawing-character-ascii-code-194.html) they needed those borders as all the interface were text, and these borders were the only way to layout the user interface properly. So they added extra characters after number 127 which contained these. Unfortunately not all computers used the same set, e.g. IBM compatible ones used something called code page 437 - which included these border characters. E.g. an old DOS app called Norton Commander:

Norton_Commander_5.51.png

https://en.wikipedia.org/wiki/Box-drawing_character

 

These days some people try to approximate these using -, + and |. And to make life easier if you use those, then try using this online app: http://www.asciiflow.com/#Draw . The benefit in using this is it doesn't depend on any special characters, only some in the normal 1st 7 bits of ASCII which is common to ASCII, ANSI, most other 8 bit code pages and UTF8.

 

BTW, why are you using Notepad for your lisps? If at all possible, I'd advise using AutoCAD's built-in VLIDE. Or if you can't (e.g. not using AutoCAD on Windows) then get a decent code editor which can handle lisp, e.g. I like Notepad++, though there are lots of free ones to choose from (http://forums.augi.com/showthread.php?120750-Best-editor-for-LISP).

Link to comment
Share on other sites

Hi irneb,

 

Thanks again for the help.

 

http://www.asciiflow.com/#Draw looks like a handy tool. Would you suggest I just stick to characters available in the ANSI encoding (instead of the border characters I have chose above – even though they seem to work, at the moment)?

 

Haha, I only use Notepad because it was the program which opened the first routine I edited. Not out of sentiment… rather, that was the default program and I never thought to change. I do use VLIDE sometimes (mainly when debugging)… but I just find Notepad to be a little more accessibly when opening files from windows explorer. Sounds like I should check out Notepad++.

 

Thanks again.

Link to comment
Share on other sites

You're very welcome!

Would you suggest I just stick to characters available in the ANSI encoding (instead of the border characters I have chose above – even though they seem to work, at the moment)?
That's up to you. I'd definitely think about it if your code is supposed to be used by other people. While most editors can handle UTF8, the issue is that some may not have a font installed which displays those characters correctly.

 

With source code (of any language) it's always a good idea to stay with the lowest common denominator as far as text encoding goes. Just so there's no "yes-but" scenarios.

 

Sounds like I should check out Notepad++
I'd definitely advise something like this. Especially with lisp it helps to have an editor which "counts" parentheses for you. In Notepad, that's my biggest pain. And then the difficulty in formatting the code into something "readable", is a herculean task in Notepad, while VLIDE turns it into a one click operation, and Notepad++ formats as you type (or with some of its addons there are also auto-formatters available).
Link to comment
Share on other sites

Awesome, thanks again for all your help irneb.

 

I will certainly check out Notepad++, sounds like it could almost eliminate the need for me to check my code in VLIDE (as most of my debugging results turn out to be a parentheses count error).

 

I also think I will heed your advice regarding the UNICODE characters... sticking with the stock ANSI ones seem like it could avoid a bit of potential confusion in the future.

 

Thanks again irneb, you've been a great help.

Link to comment
Share on other sites

I also think I will heed your advice regarding the UNICODE characters... sticking with the stock ANSI ones seem like it could avoid a bit of potential confusion in the future.
Yes, this is probably a very good idea. I recall way back when AutoLISP was still lonesome, I had an editor that allowed me to create nicely blocked headers. I took the bait. :facepalm: Those headers look like garbage now. :( I wish I'd had someone like irneb to warn me. Good job irneb.:thumbsup:
Link to comment
Share on other sites

Those headers look like garbage now.
You could always write a script (even inside AutoLisp ;)) to read those files and save them back - converting those characters into something less unreadable.

 

E.g. the double vertical line (║) is code 186 in the old DOS code page 437. Or if it's saved into Unicode (little endian), then it's 81 and 37. Under UTF8: 226 149 145. In normal ANSI (Latin-1) (same as the default Windows-1252 code page), the closest match would be | (code 124).

 

Edit: Or even easier ... just open it using NotePad++. Set it's code page to Encoding / Character Sets / Western European / OEM-US. Then choose its Encoding / Convert to UTF-8. Then Encoding / Convert to ANSI - it will convert those characters to similar looking stuff in the default ANSI character set.

Link to comment
Share on other sites

Edit: Or even easier ... just open it using NotePad++. Set it's code page to Encoding / Character Sets / Western European / OEM-US. Then choose its Encoding / Convert to UTF-8. Then Encoding / Convert to ANSI - it will convert those characters to similar looking stuff in the default ANSI character set.
E.g. copy-pasting the code block in post #5 into Notepad++ (while it's set to UTF8 ) then convert to ANSI, gives me this:
;+-------------------------+;
;¦                         ¦;
;+-------------------------¦;
;¦                         ¦;
;+-------------------------¦;
;¦            ¦            ¦;
;¦            ¦            ¦;
;+------------+------------¦;
;¦            ¦            ¦;
;+-------------------------¦;
;+-------------------------+;

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...