Jump to content

Converting text from polylines to Text or MText


Nikon

Recommended Posts

Good afternoon

After converting pdf to dwg, some texts (letters) are created as polylines.
Is it possible to programmatically replace polylines with text?
Thanks

PL - текст.dwg

Edited by Nikon
Link to comment
Share on other sites

Through stand along AutoCAD, no paid for add ons? No.

 

Simple answer.

 

Would like to be wrong of course and someone say "You're wrong, here is a nice LISP" - but so many variables: fonts, text sizes, italics, bold, number of line segments, alphabets, and so on to look at I don't think it is out there. You'd need Optical Character Recognition software.

 

There are line to text converters out there that work on PDFs I think

 

 

The originator might be able to supply the original CAD file?

Link to comment
Share on other sites

Only .shx texts turn into polylines.
Express Tools has a function for splitting text into lines, is there really no inverse function?

Link to comment
Share on other sites

You could use OCR (Optical Character Recognition) software either when the PDF is created (Adobe Acrobat includes this capability ) or use AutoCAD Raster Design which has an OCR option.

Link to comment
Share on other sites

1 hour ago, Nikon said:

Express Tools has a function for splitting text into lines, is there really no inverse function?

 

From PDF - You'd  need to identify the lines that are text, by the time they are converted to lines, PDF, imported to CAD, PDF 'exploded' then they are just lines at that stage.

 

Optical character recognition is powerful enough to do this but for a free piece of code, you'll be struggling to get a LISP

Link to comment
Share on other sites

Newer AutoCAD has this function, maybe around 2018 version.

 

Are you on AutoCAD 2015 as shown in your information?

 

If you have newer, there is PDFSHXTEXT.

 

The OCR with Raster Design only does text in an image AFAIK.

 

P.S. For future reference, PDFSHXTEXT started with AutoCAD 2017.1 from what I found.

Edited by SLW210
Added information
  • Like 1
Link to comment
Share on other sites

But that isn't brilliant. A quick 13 character test found 13 letters - including 2 that were the dots above the letter i, 2 were missed. At an angle of 32 degrees it didn't convert any. The help suggests converting the characters 1 by 1 and then checking, and recombining - not brilliant yet.

 

A LISP I part built a while ago asked the user to select the lines that made up a word or sentence, gave a pop up box for the user to retype the word. The original lines were deleted and that text inserted, angled and sized to the longest lines on the assumption that these were the uprights. A couple of tweaks to set the text to a standard size (so no 2.4876 height - it would be 2.5) and the angle was a mean angle between the uprights. It didn't work well so left it on the 'come back later' pile

 

 

  • Like 1
Link to comment
Share on other sites

I never once said anything about it being brilliant.

 

I do not remember the steps, but you can flatten the PDF and use Acrobat's OCR before bringing into AutoCAD. I used to use Acrobat and Illustrator to create vector from PDF, etc. and previous to that GhostScript, pdftotxt and ImageMagick on PDFs. It's a learning curve, but up until a few years ago I did all of this on a Linux distro and used the terminal a lot.

 

pdftotext(1) (xpdfreader.com     ImageMagick – Download  

 

With a few tweaks, I usually get pretty good results, there are some settings, as well as you can add more fonts to match up. It is a pain in the *** it only does horizontal text. Though, if you could get a LISP to rotate the view to align them horizontal, it would speed things up.

 

I never tried the OCR in Raster Design, but you could try making the lines in AutoCAD into an image and try different OCR programs, most do reasonably well on black letters on white background.

 

Fortunately for me, I usually have to make them actual text only occasionally these days. Lots more people using TTF fonts as well helps.

 

So here is the settings for set up the PDFSHXTEXT. 

 

SHXTXT.png

  • Like 1
Link to comment
Share on other sites

Thanks, SLW210!

It's a bit of a long process when there are a lot of drawings and texts ...

And it's impossible to match a set of text lines with a .shx font after converting pdf to dwg using LISP in any way?

Link to comment
Share on other sites

There is PDFSHXTEXT which should run in a Script. I suppose it could run in a batch process. Only problem I see, the more geometry selected, the less accurate the results and if not horizontal, the drawing needs rotated, running from a script with no user input would be selecting all and hope for the best.

 

Should be able to use a LISP to run the command and maybe automatically select smaller areas in the drawing.

 

There have been some efforts to create a LISP. Need help with finding PDFSHXTEXT variable - Autodesk Community - AutoCAD

 

Way past my LISP level, plus I have a lot of work going on right now. Might be a good opportunity for you to give LISP a shot.

Link to comment
Share on other sites

If all of the PDFs are simple like your example, it may work okay using AutoCAD.

 

But, I would concentrate on fixing the text in the PDF before importing to AutoCAD. Maybe check some Adobe Acrobat fora and/or research the pdftoedit, ImageMagick, Ghostscript, etc.

 

Overall, if you have a lot of them to do, you might be happier with the results.

 

On that note, I have seen PDFIMPORT  scripts, LISPs, etc. So fix the text in PDF, then batch create the .dwg for them.

  • Like 1
Link to comment
Share on other sites

No, the drawings are not simple. The example just shows a part of the text from the polylines.
The drawing has a large number of callouts and specifications with .shx fonts.
Your advice is clear: fix PDF.
Maybe because of such difficulties with fonts, it is worth abandoning .shx? To all users?
(I understand that this is impossible...)

Link to comment
Share on other sites

38 minutes ago, Nikon said:

Maybe because of such difficulties with fonts, it is worth abandoning .shx? To all users?
 

 

I'd agree with that but there might be times when the originator doesn't want a conversion from PDF to be easy.

 

If you have a lot to do can you go back to originator to ask for a DWG?

Link to comment
Share on other sites

There is no way to request a DWG, and yes, often the creator does not want his drawings to be used by others...

  • Like 1
Link to comment
Share on other sites

When I say simple, I mean the font used. If everything is horizontal and pretty much all simplex, that would be an easy conversion to run a script in batch of drawings.

 

One other thing, you might try this VectPDF download | SourceForge.net I used it prior to AutoCAD having the import PDF function. Not sure if it has had any updates in a while.

 

As for SHX fonts, you can make them comments when plotted to PDF. Acrobat can plot them as searchable with PDFMaker. (I am not sure how that comes back into AutoCAD, though.)

 

True Type Fonts can also be made non-searchable, so not foolproof either. (I am not sure how that comes back into AutoCAD, either.)

 

How to create selectable and searchable text in a PDF from AutoCAD (autodesk.com)

 

Quote

For TrueType fonts, do not alter the text from the original font, such as changing width (must be 1.0) or other style options.

Make sure that the Z coordinate value of the text object is zero.

If SHX fonts are used, set the PDFSHX variable to 1 (for AutoCAD 2017 and later; EPDFSHX for AutoCAD 2016). There is no AutoCAD option or feature to make SHX searchable in a PDF in AutoCAD 2015 and earlier.

 

 

Unfortunately, OCR has been used a lot more for creating image text to editable  text, much more development in that area I would surmise, I had some very good OCR software that came with a scanner way back in the 80s. I have found very little on batch converting, either in AutoCAD, Adobe or others. Acrobat may be able to batch convert, I am not sure on that.

 

I would just suggest pick a method and get to work on them manually. 

  • Like 2
Link to comment
Share on other sites

All inscriptions are made horizontally in a simplex.shx font.
SLW210, thanks for the detailed explanations!

Link to comment
Share on other sites

I have done more conversions than I want to think about. The sad thing is most of the PDFs I converted should have been supplied as .dwg, but the powers that be either didn't ensure compliance and/or it was never stipulated. Fortunately, we can usually make a call or send an email these days and get a .dwg.  (If the company is still in business.)

 

Try redoing a batch of Raster PDFs and Raster Images. 

 

No matter how much is in the drawing, I would wager you will be fine using a Script and Batch file on them in that case.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...