russell
Statesman
Chain driven
Posts: 762
|
Post by russell on Jun 26, 2009 9:04:45 GMT
First let me congratulate David on the website which is getting better all the time.
In Smoke Rings, ME 4354, you state that you intend to use OCR software to scan in old articles. I know this software has improved considerably since I was working on it in the seventies but it still requires a lot of manual intervention. What is the point? Are you intending to edit the articles to bring them up to date? Or is there a copyright issue?
It would surely be far easier just to scan them as PDF files and put these on the website, retaining the original layout of the articles? This would require much less work.
Regards, Russell.
|
|
|
Post by davidmew on Jun 27, 2009 8:09:33 GMT
Hi There Modern OCR software is superb. It only misses out on fractions.
I have asked to put PDF's on but have been told to do it as articles. Search engines can't index PDF files. regards david
|
|
|
Post by ripslider on Jun 28, 2009 7:57:15 GMT
David,
for info - *most* search engines can handle .PDF just fine, and have done for the last couple of years. Google, Yahoo and Bing are all fine with them, that that gives you 98.5 percent of total searches.
A couple of others struggle, by they are fractional players.
Steve
|
|
|
Post by peterseager on Jun 29, 2009 19:00:32 GMT
PDF files can be graphics (ie scanned documents) or character based, created from Word etc using a program such as Adobe. The later files are definatly searchable but not graphics. The search engine would need to do OCR to read it. Not impossible in this day and age but slow. A mixture of character based titles and scanned body text might be worth looking at.
Peter
|
|