Any suggestions on how best to find data (I can't find it simply by using notepad) in a PDF? I need to process a folder full of PDF's.
TIA
Chris
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html ---
Chris
This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out.
Forgot Ghostscript could do that, thank you Alan ... works a treat 😊
-----Original Message----- From: ProfoxTech profoxtech-bounces@leafe.com On Behalf Of Alan Bourke Sent: Friday, January 12, 2024 11:27 AM To: profoxtech@leafe.com Subject: Re: PDF Scraping
Chris
This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out.
-- Alan Bourke alanpbourke (at) fastmail (dot) fm
_______________________________________________ Post Messages to: ProFox@leafe.com Subscription Maintenance: https://mail.leafe.com/mailman/listinfo/profox OT-free version of this list: https://mail.leafe.com/mailman/listinfo/profoxtech Searchable Archive: https://leafe.com/archives This message: https://leafe.com/archives/byMID/c073fe82-ac75-47ad-8a8b-e0e69350adbf@app.fa... ** All postings, unless explicitly stated otherwise, are the opinions of the author, and do not constitute legal or medical advice. This statement is added to the messages for those lawyers who are too stupid to see the obvious.
It is really easy to do with python. Sent from my iPhone
On Jan 12, 2024, at 5:47 AM, Chris Davis chrisd@actongate.co.uk wrote:
Forgot Ghostscript could do that, thank you Alan ... works a treat 😊
-----Original Message----- From: ProfoxTech profoxtech-bounces@leafe.com On Behalf Of Alan Bourke Sent: Friday, January 12, 2024 11:27 AM To: profoxtech@leafe.com Subject: Re: PDF Scraping
Chris
This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out.
-- Alan Bourke alanpbourke (at) fastmail (dot) fm
[excessive quoting removed by server]
On Jan 12, 2024, at 21:51, Brian Erickson brian@dashley.net wrote:
It is really easy to do with python.
Heh, I think those exact words with most posts on this list! ;-P
-- Ed Leafe
As Stephen would say, Bad Ed! 😉
From: ProfoxTech profoxtech-bounces@leafe.com On Behalf Of Ed Leafe Sent: Monday, January 15, 2024 8:18 PM To: profoxtech@leafe.com Subject: Re: PDF Scraping
On Jan 12, 2024, at 21:51, Brian Erickson mailto:brian@dashley.net wrote:
It is really easy to do with python.
Heh, I think those exact words with most posts on this list! ;-P
-- Ed Leafe
Keeping mouth shut.
On Tue, Jan 16, 2024 at 10:24 AM Richard Kaye rkaye@invaluable.com wrote:
As Stephen would say, Bad Ed! 😉
From: ProfoxTech profoxtech-bounces@leafe.com On Behalf Of Ed Leafe Sent: Monday, January 15, 2024 8:18 PM To: profoxtech@leafe.com Subject: Re: PDF Scraping
On Jan 12, 2024, at 21:51, Brian Erickson mailto:brian@dashley.net wrote:
It is really easy to do with python.
Heh, I think those exact words with most posts on this list! ;-P
-- Ed Leafe
[excessive quoting removed by server]
Another option is the Balabolka Text Extract Utility, I have used it with success in the past.
https://www.cross-plus-a.com/btext.htm
This is the command line version, so you can run it from VFP.
Example usage:
blb2txt -f "My file.pdf" -out "My file.txt"
The program has many options, for example you can process many files at once.
Gianni
On Fri, 12 Jan 2024 12:46:50 +0000, Chris Davis chrisd@actongate.co.uk wrote:
Forgot Ghostscript could do that, thank you Alan ... works a treat ?
-----Original Message----- From: ProfoxTech profoxtech-bounces@leafe.com On Behalf Of Alan Bourke Sent: Friday, January 12, 2024 11:27 AM To: profoxtech@leafe.com Subject: Re: PDF Scraping
Chris
This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out.
-- Alan Bourke alanpbourke (at) fastmail (dot) fm
[excessive quoting removed by server]
Looks interesting, I will check it out ... thanks Gianni
-----Original Message----- From: ProfoxTech profoxtech-bounces@leafe.com On Behalf Of Gianni Turri Sent: Saturday, January 13, 2024 12:07 PM To: profoxtech@leafe.com Subject: Re: PDF Scraping
Another option is the Balabolka Text Extract Utility, I have used it with success in the past.
https://www.cross-plus-a.com/btext.htm
This is the command line version, so you can run it from VFP.
Example usage:
blb2txt -f "My file.pdf" -out "My file.txt"
The program has many options, for example you can process many files at once.
Gianni
On Fri, 12 Jan 2024 12:46:50 +0000, Chris Davis chrisd@actongate.co.uk wrote:
Forgot Ghostscript could do that, thank you Alan ... works a treat ?
-----Original Message----- From: ProfoxTech profoxtech-bounces@leafe.com On Behalf Of Alan Bourke Sent: Friday, January 12, 2024 11:27 AM To: profoxtech@leafe.com Subject: Re: PDF Scraping
Chris
This is not easy in general and probably not possible without going outside of VFP. You're probably looking at leveraging Ghostcript somehow to parse the PDF files and dump the text out.
-- Alan Bourke alanpbourke (at) fastmail (dot) fm
[excessive quoting removed by server]