I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string as the process is painfully slow.
I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.
Thanks in advance,
Joe
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html ---
Just done some rough code to check out a 200,000 byte string (too painful waiting for 2M byte string to process)
Sample below shows huge benefit in the file approach.
The only thing I can think of right now to improve it further would be to do it in C++ or similar where you can easily treat the string as an array - pass everything around by ref and should be faster - but it seems the file approach is fast enough.
Substr 200,00 bytes - 1.822 seconds StrtoFile 200,00 bytes - 0.001 seconds StrtoFile 10,000,00 bytes - 0.005 seconds
*- Substr loop x = REPLICATE("Y", 200000)
lnsec = SECONDS()
FOR n = 1 TO LEN(x) y = SUBSTR(x,n,1) ENDFOR
? SECONDS() - m.lnSec && 1.822
*------------------------- *- File approach 200,000 bytes *-------------------------
lnsec = SECONDS()
=STRTOFILE(x, "c:\temp\test.txt")
lnFile = FOPEN("c:\temp\test.txt", 2) lnChrs = FSEEK(m.lnFile,0,2)
? lnChrs
=FSEEK(m.lnFile, 0, 0)
FOR lnChr = 1 TO m.lnChrs y = FREAD(m.lnFile, 1) ENDFOR
=FCLOSE(m.lnFile)
? SECONDS() - m.lnSec && 0.001 secs
*------------------------- *- File approach 10,000,000 bytes *-------------------------
x = REPLICATE("Y", 10000000)
lnsec = SECONDS()
=STRTOFILE(x, "c:\temp\test.txt")
lnFile = FOPEN("c:\temp\test.txt", 2) lnChrs = FSEEK(m.lnFile,0,2)
? lnChrs
=FSEEK(m.lnFile, 0, 0)
FOR lnChr = 1 TO m.lnChrs y = FREAD(m.lnFile, 1) ENDFOR
=FCLOSE(m.lnFile)
? SECONDS() - m.lnSec && 0.005 secs
-----Original Message----- From: ProfoxTech [mailto:profoxtech-bounces@leafe.com] On Behalf Of Joe Yoder Sent: Saturday, 10 September 2016 3:33 PM To: profoxtech@leafe.com Subject: Processing a long character string one character at a time
I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string as the process is painfully slow.
I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.
Thanks in advance,
Joe
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html ---
[excessive quoting removed by server]
Hi, Joe:
Sorry, but my weekend is pretty full, so I can't answer as thoroughly as I'd like to. String functions in VFP are blazingly fast, but have to be used correctly in ways that aren't always obvious.
CSV with carriage returns inside fields is not CSV, in my not-so-humble opinion. ALL DBMSes have trouble with long text fields.
IAC, check out http://fox.wikis.com/wc.dll?Wiki~StringHandling for some clues. MLINE() and _MLINE against your string variable (I know it says it's about memo fields) is what you want, *I think*, but don't have time to check.
On Sat, Sep 10, 2016 at 1:32 AM, Joe Yoder joe@wheypower.com wrote:
I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string as the process is painfully slow.
I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.
Thanks in advance,
Joe
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html
[excessive quoting removed by server]
Is there a split function in VFP to take every word of the string into an array? Then you could parse each array element till done. C# example here.
string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }
On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:
I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string as the process is painfully slow.
I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.
Thanks in advance,
Joe
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html
[excessive quoting removed by server]
To split a string into words you can use something like this:
n = GETWORDCOUNT(mysring) FOR i = 1 TO n ? GETWORDNUM(mystring,i) && do something with the word ENDFOR
Laurie
On 10 September 2016 at 16:02, Stephen Russell srussell705@gmail.com wrote:
Is there a split function in VFP to take every word of the string into an array? Then you could parse each array element till done. C# example here.
string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }
On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:
I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string
as
the process is painfully slow.
I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.
Thanks in advance,
Joe
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html
[excessive quoting removed by server]
The problem with that kind of approach is essentially, the algorithm counts words one at a time to get to the one you want: 1,2,3,... which is fine for 100 words, but unworkable for a million.
On Sat, Sep 10, 2016 at 9:06 PM, Laurie Alvey trukker41@gmail.com wrote:
To split a string into words you can use something like this:
n = GETWORDCOUNT(mysring) FOR i = 1 TO n ? GETWORDNUM(mystring,i) && do something with the word ENDFOR
Laurie
On 10 September 2016 at 16:02, Stephen Russell srussell705@gmail.com wrote:
Is there a split function in VFP to take every word of the string into an array? Then you could parse each array element till done. C# example here.
string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }
On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:
I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string
as
the process is painfully slow.
I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.
Thanks in advance,
Joe
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html
[excessive quoting removed by server]
I had a hunch that the SUBSTR approach becomes less and less efficient as the string length increases in size so I put the following code together. Note - I commented out the one million test as it takes about 5 minutes on my machine. The results are impressive! - Joe
? 'One million Character by character reads from strings of different lengths' DO Readc WITH 10 DO Readc WITH 100 DO Readc WITH 1000 DO Readc WITH 10000 DO Readc WITH 100000 *DO Readc WITH 1000000
? 'One million Character by character reads from a file' DO Freadc WITH 1000000 RETURN
FUNCTION Readc PARAMETERS m.StrLen m.String = REPLICATE('t', m.StrLen) m.cnt = 0 m.Start = SECONDS()
DO WHILE m.cnt < 1000000 FOR m.x = 1 TO m.StrLen m.ch = SUBSTR(m.String, m.x, 1) m.cnt = m.cnt + 1 ENDFOR ENDDO ? m.StrLen, SECONDS() - m.start return
FUNCTION Freadc PARAMETERS m.StrLen m.Start = SECONDS() STRTOFILE(REPLICATE('t', m.StrLen), 'Ftest.tmp') m.InFile = FOPEN('Ftest.tmp') FOR m.x = 1 TO m.StrLen m.ch = FREAD(m.InFile, 1) ENDFOR FCLOSE(m.InFile) ERASE Ftest.tmp ? m.StrLen, SECONDS() - m.start return
On Sun, Sep 11, 2016 at 11:57 AM, Ted Roche tedroche@gmail.com wrote:
The problem with that kind of approach is essentially, the algorithm counts words one at a time to get to the one you want: 1,2,3,... which is fine for 100 words, but unworkable for a million.
On Sat, Sep 10, 2016 at 9:06 PM, Laurie Alvey trukker41@gmail.com wrote:
To split a string into words you can use something like this:
n = GETWORDCOUNT(mysring) FOR i = 1 TO n ? GETWORDNUM(mystring,i) && do something with the word ENDFOR
Laurie
On 10 September 2016 at 16:02, Stephen Russell srussell705@gmail.com wrote:
Is there a split function in VFP to take every word of the string into
an
array? Then you could parse each array element till done. C# example here.
string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }
On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:
I have a routine that processes each character in a file. The file
I am
working with is over 2 million characters long. I pull it into a
memory
variable with filetostr and then process each character with the
substr
command. Apparently substr has problems when dealing with a long
string
as
the process is painfully slow.
I suspect that I will be better off using the low level file routines
to
read one character at a time but thought maybe someone knows of a way
to
speed up the approach I am using now.
Thanks in advance,
Joe
--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html
[excessive quoting removed by server]