The bottom line is yes, your approach is reasonable. I have done pretty much the same approach myself.
To sum up: RLOCK() is fine, but I think I did an extra step like SKIP -1/SKIP (and consider the "refresh" setting). I don't recall the original thread, but I presume you're setting some kind of flag field to denote the record was processed already. Only set that after a successful RLOCK() of course.
I would suggest trying your set up on few hundred records, and put a bit of logging in there to describe what happens (simple strtofile() statements even). I think the benefit of number of processes running might max out based on the number of cores on the machine - e.g. a 4-core machine may not benefit from running 16 .exe instances. But, on the other hand, if the API call is the thing taking the vast majority of time, it would seem like the local resources are mostly twiddling their thumbs. So running more than the core count may be beneficial.
But note that the API vendor may have their own throttle - maybe based on API key - to prevent someone from overloading their resources. And, of course, there are also horrible systems out there NoSQL datastores, bad Java, constrained NodeJS, just plain bad coding, blah blah, and it's hard to predict what kind of performance you'll get from them. So your logging may help find the API vendor's 'max performance' rate as well. When you hit that, like you imply, you will not be able to 'fix' it from your side.
-Charlie
On 8/2/2020 7:23 PM, MB Software Solutions, LLC wrote:
Right...we're using the API documentation from the vendor, doing it according to what they say. It's not super quick but then again I don't think most people use it for processing tens of thousands of records quickly like my client's app does. Most folks are using it to just get a single carrier's information (it's a trucking app) at a time.
On 8/2/2020 8:17 AM, António Tavares Lopes wrote:
Mike,
a) is it a Web API that you may call through an HTTP library?
b) and the parameters' values in each call to the API come from each record in the table?
On Sat, Aug 1, 2020 at 6:07 PM MB Software Solutions, LLC < mbsoftwaresolutions@mbsoftwaresolutions.com> wrote:
I've got a regular process that runs, basically using key information to grab data from an API and then update the local VFP database. There are maybe 64000 records to process, and each record to update through this process takes about a second, so to process this group would take over 17 hours. Each record could be processed on its own; there are no relationships between each.
I don't want to start it and run the 64000 in a row for 17+ hours. I would like to design the app to use the table, RLOCK the row I'm processing, and the UNLOCK the row when I'm done. I figure with this design, I could run multiple instances of the MyProgram.exe (similar to how WestWind Web Connection allows you to run multiple instances) to process the batch maybe 4x faster (if I launched 4 instances of MyProgram.exe). The basic construct would be as follows:
USE ListOfRecsToProcess IN 0 SHARED Alias MyList && record is PK (to process) i, tProcessed t, tError t, cSession c(10) SCAN FOR EMPTY(cSession) AND RLOCK('MyList') IF ProcessRecord(MyList.ID) THEN REPLACE tProcessed WITH DATETIME(), cSession WITH this.cSession IN MyList ELSE REPLACE tError WITH DATETIME(), cSession WITH this.cSession IN MyList ENDIF ENDSCAN
Does anybody see any problems with that general design? The ProcessRecord method calls an API to get values and then updates the local VFP record accordingly.
tia, --Mike
-- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
[excessive quoting removed by server]