Task: Process thousands of records through a regular process -- multiple EXE runs to cut down processing time??

List overview All Threads
Download

newer

older

Cannot build GoFish5

[NF] Curse and blessings of...

MB Software Solutions, LLC

1 Aug 2020 1 Aug '20

5:06 p.m.

I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

Show replies by date

Eric Selje

1 Aug 1 Aug

5:50 p.m.

I'm not sure I trust RLock(). I'd probably use a "Status" field that indicates the record is unprocessed, in process, or processed. This might be a good use case for Christof's DMult multi-threading utility https://kevinragsdale.net/category/multithreaded-visual-foxpro/.

Eric

On Sat, Aug 1, 2020 at 12:06 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...

I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

Christof Wollenhaupt

6:03 p.m.

...

I'm not sure I trust RLock(). I'd probably use a "Status" field that indicates the record is unprocessed, in process, or processed. This might be a good use case for Christof's DMult multi-threading utility

You still have to use either RLOCK() or a SET REFRESH value of -1 to read the status field, because otherwise VFP will lie to you about the current value of that field. Unless you move away from a record, you get the same value for hours in VFP. Even if you move away, the cache is considered to be the truth for up to how many seconds SET REFRESH indicates.

Christof

Bill Anderson

2 Aug 2 Aug

9:54 p.m.

Mike,

It sounds like you should look at ParallelFox...

Bill Anderson

On Sat, Aug 1, 2020 at 10:51 AM Eric Selje Eric@saltydogllc.com wrote:

...

I'm not sure I trust RLock(). I'd probably use a "Status" field that indicates the record is unprocessed, in process, or processed. This might be a good use case for Christof's DMult multi-threading utility https://kevinragsdale.net/category/multithreaded-visual-foxpro/.

Eric

On Sat, Aug 1, 2020 at 12:06 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...
I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

Koen Piller

1 Aug 1 Aug

6:05 p.m.

Hi,

64000 records in 17 hours???? That is way unacceptable. To run the code you published would take normally around minutes max. I suppose the most of the time is not by your code but the API call to get the data into the cursor ListOfRecsToProces, advise you to implement Codereference to find the time consuming bottle-neck. Instead of scan..endscan you may also try to make use of the SQL Update() function, this should be faster.

Koen

Op za 1 aug. 2020 om 19:07 schreef MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com>:

...

I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

Koen Piller

7:30 p.m.

Also missing the code for your procedure ProcessRecord()

Op za 1 aug. 2020 om 20:05 schreef Koen Piller koen.piller@gmail.com:

...

Hi,

64000 records in 17 hours???? That is way unacceptable. To run the code you published would take normally around minutes max. I suppose the most of the time is not by your code but the API call to get the data into the cursor ListOfRecsToProces, advise you to implement Codereference to find the time consuming bottle-neck. Instead of scan..endscan you may also try to make use of the SQL Update() function, this should be faster.

Koen

Op za 1 aug. 2020 om 19:07 schreef MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com>:

...
I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

MB Software Solutions, LLC

2 Aug 2 Aug

9:52 a.m.

Koen,

This is a question of theory. I'm not going to lay out the entire set of procedures that get run here. Presumptuous of you to make that "unacceptable" statement without knowing what actually happens behind the scenes. I could be talking about a process that does 20 different things, checking this, that, and another thing, writing here, writing there. As another mentioned, the bulk of time was the API (outside of my actual process), anyway.

On 8/1/2020 3:30 PM, Koen Piller wrote:

...

Also missing the code for your procedure ProcessRecord()

Op za 1 aug. 2020 om 20:05 schreef Koen Piller koen.piller@gmail.com:

...
Hi,

64000 records in 17 hours???? That is way unacceptable. To run the code you published would take normally around minutes max. I suppose the most of the time is not by your code but the API call to get the data into the cursor ListOfRecsToProces, advise you to implement Codereference to find the time consuming bottle-neck. Instead of scan..endscan you may also try to make use of the SQL Update() function, this should be faster.

Koen

Op za 1 aug. 2020 om 19:07 schreef MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com>:

...
I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

Stephen Russell

1 Aug 1 Aug

8:56 p.m.

What is your bottleneck in the process, an update of the backend data ?

Can your update or insert be made in groups of 100 or 1000 at a pass, instead of one at a time? Or do you need a key back for additional processing?

On Sat, Aug 1, 2020 at 12:07 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...

I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

António Tavares Lopes

2 Aug 2 Aug

12:17 p.m.

Mike,

a) is it a Web API that you may call through an HTTP library?

b) and the parameters' values in each call to the API come from each record in the table?

On Sat, Aug 1, 2020 at 6:07 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...

I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

MB Software Solutions, LLC

11:23 p.m.

Right...we're using the API documentation from the vendor, doing it according to what they say. It's not super quick but then again I don't think most people use it for processing tens of thousands of records quickly like my client's app does. Most folks are using it to just get a single carrier's information (it's a trucking app) at a time.

On 8/2/2020 8:17 AM, António Tavares Lopes wrote:

...

Mike,

a) is it a Web API that you may call through an HTTP library?

b) and the parameters' values in each call to the API come from each record in the table?

On Sat, Aug 1, 2020 at 6:07 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...
I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

António Tavares Lopes

3 Aug 3 Aug

10:53 a.m.

Depending on specifics, HTTP calls can be issued asynchronously. That would mean, if applicable to your case, that the local process would not have to wait for the API call to return in each iteration. Single-record processing would be triggered upon completion of the respective API servicing, but wouldn't stop the table scanning, at least completely. For instance, you could set a number of concurrent calls - say 20, or some other number that may be proven effective.

An example of this setup can be found in the overHere SDK ( https://github.com/atlopes/overHere). Calls to the Here location platform "Geocoding Autocomplete API" are performed asynchronously. In the AutoComplete-Async demo form, as the user keeps editing an address in the textbox control, underlying calls to the platform web services are launched and the retrieved suggestions displayed as fetched, while not preventing the user to continue his/her editing.

The overHere SDK implements a callback mechanism that has to deal with object persistence and XML over HTTP (instead of simple XMLHttp request), but a simpler method could eventually be set in place for your scenario.

On Mon, Aug 3, 2020 at 12:23 AM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...

Right...we're using the API documentation from the vendor, doing it according to what they say. It's not super quick but then again I don't think most people use it for processing tens of thousands of records quickly like my client's app does. Most folks are using it to just get a single carrier's information (it's a trucking app) at a time.

On 8/2/2020 8:17 AM, António Tavares Lopes wrote:

...
Mike,

a) is it a Web API that you may call through an HTTP library?

b) and the parameters' values in each call to the API come from each

record

...
in the table?

On Sat, Aug 1, 2020 at 6:07 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...
I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH

this.cSession

...
...
IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

Charlie

11:43 a.m.

The bottom line is yes, your approach is reasonable. I have done pretty much the same approach myself.

To sum up: RLOCK() is fine, but I think I did an extra step like SKIP -1/SKIP (and consider the "refresh" setting). I don't recall the original thread, but I presume you're setting some kind of flag field to denote the record was processed already. Only set that after a successful RLOCK() of course.

I would suggest trying your set up on few hundred records, and put a bit of logging in there to describe what happens (simple strtofile() statements even). I think the benefit of number of processes running might max out based on the number of cores on the machine - e.g. a 4-core machine may not benefit from running 16 .exe instances. But, on the other hand, if the API call is the thing taking the vast majority of time, it would seem like the local resources are mostly twiddling their thumbs. So running more than the core count may be beneficial.

But note that the API vendor may have their own throttle - maybe based on API key - to prevent someone from overloading their resources. And, of course, there are also horrible systems out there NoSQL datastores, bad Java, constrained NodeJS, just plain bad coding, blah blah, and it's hard to predict what kind of performance you'll get from them. So your logging may help find the API vendor's 'max performance' rate as well. When you hit that, like you imply, you will not be able to 'fix' it from your side.

-Charlie

On 8/2/2020 7:23 PM, MB Software Solutions, LLC wrote:

...

Right...we're using the API documentation from the vendor, doing it according to what they say. It's not super quick but then again I don't think most people use it for processing tens of thousands of records quickly like my client's app does. Most folks are using it to just get a single carrier's information (it's a trucking app) at a time.

On 8/2/2020 8:17 AM, António Tavares Lopes wrote:

...
Mike,

a) is it a Web API that you may call through an HTTP library?

b) and the parameters' values in each call to the API come from each record in the table?

On Sat, Aug 1, 2020 at 6:07 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...
I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

Ted Roche

2 Aug 2 Aug

3:45 p.m.

This is a classic queue pattern: Make a task list of the items to be processed (and perhaps breaking up multiple tasks for each item), and run multiple workers against the list, updating the status as they go. As Christof says, ultimately you have to LOCK, UPDATE, UNLOCK and think about REFRESH.

64,000 records should take seconds, except for the slow parts. Focus on those: is it a slow API? Is it setup and takedown that you could cache?

Think about the idea of separating the slow parts from the fast parts, so your queue might have a slow part 1 and fast part 2 to be completed by different executables. Then you could launch a dozen EXEs to process the slow parts, improving the throughput, and a few EXEs to do the fast parts.

On Sat, Aug 1, 2020 at 1:07 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...

I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.

I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:

USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN

Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.

tia, --Mike

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

[excessive quoting removed by server]

MB Software Solutions, LLC

11:24 p.m.

On 8/2/2020 11:45 AM, Ted Roche wrote:

...

This is a classic queue pattern: Make a task list of the items to be processed (and perhaps breaking up multiple tasks for each item), and run multiple workers against the list, updating the status as they go. As Christof says, ultimately you have to LOCK, UPDATE, UNLOCK and think about REFRESH.

64,000 records should take seconds, except for the slow parts. Focus on those: is it a slow API? Is it setup and takedown that you could cache?

Think about the idea of separating the slow parts from the fast parts, so your queue might have a slow part 1 and fast part 2 to be completed by different executables. Then you could launch a dozen EXEs to process the slow parts, improving the throughput, and a few EXEs to do the fast parts.

Thanks for the ideas. Honestly, it's the API that's the slow part of the whole mix. That's the vendor's programmers; out of my control.

-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

Ed Leafe

3 Aug 3 Aug

12:37 p.m.

On Aug 2, 2020, at 18:24, MB Software Solutions, LLC mbsoftwaresolutions@mbsoftwaresolutions.com wrote:

...

Thanks for the ideas. Honestly, it's the API that's the slow part of the whole mix. That's the vendor's programmers; out of my control.

How you call the API is under your control, though.

I don’t have the first idea how this would be done in VFP, but in general you get around a blocking task by using some async design. The most direct approach would be multiple threads, but some languages have a callback design. One of the reasons that the Go language became wildly popular over the past few years is that is async is a fundamental design in the language: https://medium.com/@gauravsingharoy/asynchronous-programming-with-go-546b96c....

-- Ed Leafe

Tracy Pearson

2:45 p.m.

Mike,

There are SO many ways to do these processes. VFP can launch other VFP COM objects and become multi-threaded. VFP has a project that Bill mentioned earlier and I have never touched ParallelFox C# has the ability to multi-thread and use a VFP COM object.

You're bottleneck is the API. Does the API also have a threshold that will block you for a time? As an example of what I mean, the one of APIs I have written code to connect with only allow 5 calls a second. Making a 6th call in the same second will block the access token for 60 seconds. It's nice enough to inform me of the wait period I need when it is blocked. Testing proofed that. I keep that in mind when researching new APIs to use.

One thing I discovered with VFP and using the Winsock COM object to make calls to an API. The HTTP port will remain open for a short time. Even closing the COM object doesn't release the port. So a number of calls and the ports stay open for a period. We had a customer change a Registry setting to reduce the time the port will stay open and it helped with an odd accessing an API. I will need to dig up the support history to know what that registry key is, and the actual error.

Tracy

-----Original Message----- From: ProfoxTech [mailto:profoxtech-bounces@leafe.com] On Behalf Of Ed Leafe Sent: Monday, August 03, 2020 8:38 AM To: profoxtech@leafe.com Subject: Re: Task: Process thousands of records through a regular process -- multiple EXE runs to cut down processing time??

On Aug 2, 2020, at 18:24, MB Software Solutions, LLC mbsoftwaresolutions@mbsoftwaresolutions.com wrote:

...

Thanks for the ideas. Honestly, it's the API that's the slow part of the whole mix. That's the vendor's programmers; out of my control.

How you call the API is under your control, though.

-- Ed Leafe

[excessive quoting removed by server]

Ted Roche

8:56 p.m.

On Sun, Aug 2, 2020 at 7:25 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:

...

Thanks for the ideas. Honestly, it's the API that's the slow part of the whole mix. That's the vendor's programmers; out of my control.

It's likely, if you can get in touch with a tech on the other end, that they could send you a CSV dump file rather than bog down their API with a lot of slow, single-record request.

OTOH, you might separate the acquisition task (the API) and storing the result as an overnight process (so no one cares how long it takes) and do the speedy stuff at a later point.

--- StripMime Report -- processed MIME parts --- multipart/alternative text/plain (text body -- kept) text/html ---

2025

Age (days ago)

2027

Last active (days ago)

profox@leafe.com

16 comments

11 participants

tags (0)

participants (11)

António Tavares Lopes
Bill Anderson
Charlie
Christof Wollenhaupt
Ed Leafe
Eric Selje
Koen Piller
MB Software Solutions, LLC
Stephen Russell
Ted Roche
Tracy Pearson