Processing a long character string one character at a time

List overview All Threads
Download

newer

older

Re: Leading Zeroes

Importing a CSV with carriage...

Joe Yoder

10 Sep 2016 10 Sep '16

5:32 a.m.

I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string as the process is painfully slow.

I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.

Thanks in advance,

Joe

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html ---

Show replies by date

Darren

10 Sep 10 Sep

7:57 a.m.

Just done some rough code to check out a 200,000 byte string (too painful waiting for 2M byte string to process)

Sample below shows huge benefit in the file approach.

The only thing I can think of right now to improve it further would be to do it in C++ or similar where you can easily treat the string as an array - pass everything around by ref and should be faster - but it seems the file approach is fast enough.

Substr 200,00 bytes - 1.822 seconds StrtoFile 200,00 bytes - 0.001 seconds StrtoFile 10,000,00 bytes - 0.005 seconds

*- Substr loop x = REPLICATE("Y", 200000)

lnsec = SECONDS()

FOR n = 1 TO LEN(x) y = SUBSTR(x,n,1) ENDFOR

? SECONDS() - m.lnSec && 1.822

*------------------------- *- File approach 200,000 bytes *-------------------------

lnsec = SECONDS()

=STRTOFILE(x, "c:\temp\test.txt")

lnFile = FOPEN("c:\temp\test.txt", 2) lnChrs = FSEEK(m.lnFile,0,2)

? lnChrs

=FSEEK(m.lnFile, 0, 0)

FOR lnChr = 1 TO m.lnChrs y = FREAD(m.lnFile, 1) ENDFOR

=FCLOSE(m.lnFile)

? SECONDS() - m.lnSec && 0.001 secs

*------------------------- *- File approach 10,000,000 bytes *-------------------------

x = REPLICATE("Y", 10000000)

lnsec = SECONDS()

=STRTOFILE(x, "c:\temp\test.txt")

lnFile = FOPEN("c:\temp\test.txt", 2) lnChrs = FSEEK(m.lnFile,0,2)

? lnChrs

=FSEEK(m.lnFile, 0, 0)

FOR lnChr = 1 TO m.lnChrs y = FREAD(m.lnFile, 1) ENDFOR

=FCLOSE(m.lnFile)

? SECONDS() - m.lnSec && 0.005 secs

-----Original Message----- From: ProfoxTech [mailto:profoxtech-bounces@leafe.com] On Behalf Of Joe Yoder Sent: Saturday, 10 September 2016 3:33 PM To: profoxtech@leafe.com Subject: Processing a long character string one character at a time

I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.

Thanks in advance,

Joe

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html ---

[excessive quoting removed by server]

Ted Roche

11:41 a.m.

Hi, Joe:

Sorry, but my weekend is pretty full, so I can't answer as thoroughly as I'd like to. String functions in VFP are blazingly fast, but have to be used correctly in ways that aren't always obvious.

CSV with carriage returns inside fields is not CSV, in my not-so-humble opinion. ALL DBMSes have trouble with long text fields.

IAC, check out http://fox.wikis.com/wc.dll?Wiki~StringHandling for some clues. MLINE() and _MLINE against your string variable (I know it says it's about memo fields) is what you want, *I think*, but don't have time to check.

On Sat, Sep 10, 2016 at 1:32 AM, Joe Yoder joe@wheypower.com wrote:

...

I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string as the process is painfully slow.

I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.

Thanks in advance,

Joe

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html

[excessive quoting removed by server]

Stephen Russell

3:02 p.m.

Is there a split function in VFP to take every word of the string into an array? Then you could parse each array element till done. C# example here.

string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }

On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:

...

I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string as the process is painfully slow.

I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.

Thanks in advance,

Joe

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html

[excessive quoting removed by server]

Laurie Alvey

11 Sep 11 Sep

1:06 a.m.

To split a string into words you can use something like this:

n = GETWORDCOUNT(mysring) FOR i = 1 TO n ? GETWORDNUM(mystring,i) && do something with the word ENDFOR

Laurie

On 10 September 2016 at 16:02, Stephen Russell srussell705@gmail.com wrote:

...

Is there a split function in VFP to take every word of the string into an array? Then you could parse each array element till done. C# example here.

string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }

On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:

...
I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string

as

...
the process is painfully slow.

I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.

Thanks in advance,

Joe

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html

[excessive quoting removed by server]

Ted Roche

3:57 p.m.

The problem with that kind of approach is essentially, the algorithm counts words one at a time to get to the one you want: 1,2,3,... which is fine for 100 words, but unworkable for a million.

On Sat, Sep 10, 2016 at 9:06 PM, Laurie Alvey trukker41@gmail.com wrote:

...

To split a string into words you can use something like this:

n = GETWORDCOUNT(mysring) FOR i = 1 TO n ? GETWORDNUM(mystring,i) && do something with the word ENDFOR

Laurie

On 10 September 2016 at 16:02, Stephen Russell srussell705@gmail.com wrote:

...
Is there a split function in VFP to take every word of the string into an array? Then you could parse each array element till done. C# example here.

string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }

On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:

...
I have a routine that processes each character in a file. The file I am working with is over 2 million characters long. I pull it into a memory variable with filetostr and then process each character with the substr command. Apparently substr has problems when dealing with a long string

as

...
the process is painfully slow.

I suspect that I will be better off using the low level file routines to read one character at a time but thought maybe someone knows of a way to speed up the approach I am using now.

Thanks in advance,

Joe

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html

[excessive quoting removed by server]

Joe Yoder

12 Sep 12 Sep

9:56 a.m.

I had a hunch that the SUBSTR approach becomes less and less efficient as the string length increases in size so I put the following code together. Note - I commented out the one million test as it takes about 5 minutes on my machine. The results are impressive! - Joe

? 'One million Character by character reads from strings of different lengths' DO Readc WITH 10 DO Readc WITH 100 DO Readc WITH 1000 DO Readc WITH 10000 DO Readc WITH 100000 *DO Readc WITH 1000000

? 'One million Character by character reads from a file' DO Freadc WITH 1000000 RETURN

FUNCTION Readc PARAMETERS m.StrLen m.String = REPLICATE('t', m.StrLen) m.cnt = 0 m.Start = SECONDS()

DO WHILE m.cnt < 1000000 FOR m.x = 1 TO m.StrLen m.ch = SUBSTR(m.String, m.x, 1) m.cnt = m.cnt + 1 ENDFOR ENDDO ? m.StrLen, SECONDS() - m.start return

FUNCTION Freadc PARAMETERS m.StrLen m.Start = SECONDS() STRTOFILE(REPLICATE('t', m.StrLen), 'Ftest.tmp') m.InFile = FOPEN('Ftest.tmp') FOR m.x = 1 TO m.StrLen m.ch = FREAD(m.InFile, 1) ENDFOR FCLOSE(m.InFile) ERASE Ftest.tmp ? m.StrLen, SECONDS() - m.start return

On Sun, Sep 11, 2016 at 11:57 AM, Ted Roche tedroche@gmail.com wrote:

...

The problem with that kind of approach is essentially, the algorithm counts words one at a time to get to the one you want: 1,2,3,... which is fine for 100 words, but unworkable for a million.

On Sat, Sep 10, 2016 at 9:06 PM, Laurie Alvey trukker41@gmail.com wrote:

...
To split a string into words you can use something like this:

n = GETWORDCOUNT(mysring) FOR i = 1 TO n ? GETWORDNUM(mystring,i) && do something with the word ENDFOR

Laurie

On 10 September 2016 at 16:02, Stephen Russell srussell705@gmail.com wrote:

...
Is there a split function in VFP to take every word of the string into

an

...
...
array? Then you could parse each array element till done. C# example here.

string s = FromYourTextfile; // Split string on spaces. // ... This will separate all the words. string[] words = s.*Split*(' '); foreach (string word in words) { // parse your word here for what you need. }

On Sat, Sep 10, 2016 at 12:32 AM, Joe Yoder joe@wheypower.com wrote:

...
I have a routine that processes each character in a file. The file

I am

...
...
...
working with is over 2 million characters long. I pull it into a

memory

...
...
...
variable with filetostr and then process each character with the

substr

...
...
...
command. Apparently substr has problems when dealing with a long

string

...
...
as

...
the process is painfully slow.

I suspect that I will be better off using the low level file routines

to

...
...
...
read one character at a time but thought maybe someone knows of a way

to

...
...
...
speed up the approach I am using now.

Thanks in advance,

Joe

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html

[excessive quoting removed by server]

3447

Age (days ago)

3449

Last active (days ago)

profox@leafe.com

6 comments

5 participants

tags (0)

participants (5)

Darren
Joe Yoder
Laurie Alvey
Stephen Russell
Ted Roche