Informatik, Modellbau und Privates von Georg
[ start | index | login ]

Changes of Simple Parallel Exec from #15 to #16

Changed lines at line 101
101: [Input filename]* (filename can be "STDIN")
102: Content=<<ENDOFCONTENT
103: real file content here (ASCII)
104: ENDOFCONTENT
105: [Result filename]+ (filename can be "STDOUT")
106: Name=resultname
107: {code}
108: * the * behinds the section means there can be _zero_ or more sections
109: * the + behinds the section means there can be _one_ or more sections
110: 1.1 Task completed
111: - Successful: POST \http://master/complete?sessionid=SESSIONID&ticket=TICKET
112: {code:none}
113: [Result]+
114: Name=resultname
115: Content=<<ENDOFCONTENT
116: file content here (ASCII)
117: ENDOFCONTENT
118: {code}
119: - Failed: GET \http://master/failed?sessionid=SESSIONID&ticket=TICKET
120: - Reply Fail due to wrong session id: 403 (Forbidden)
121: - Reply Otherwise: 200 OK
122: * binary content is not supported
123: 1 Implementation Details
124: 1.1 Configuation and Files
125: - Server config: TODO describe
126: - Parameter file: csv file, cells seperated with |, parameter names in the headline and every following line contains one parameter set. All lines have to have the same amount of cells like the headline!
127: - Specification of worker: Command line: a perl syntax string with parameter Variables for the parameters; Input files: Name of the file and parameter name to write in.
128: - Result specification: A result consists of a list of name - value pairs. Where name specifies the name of the particular output and value decides where the output comes from. For example myoutput="stdout", myfileoutput="out.txt".
129: - Validation: standart implementations are provided and a custom implementation can be provided by the user as a perl function. A validation function gets the result of the worker and returns success or failture.
130: - Collection: standart implementations are provided and a custom implementation can be provided by the user as a perl function. A collection function gets task description (number, parameters set) and the result of the worker and can do whatever it wants with it (usually writes in a file).
131: 1.1 NFS awareness
132: The problem is, that if some slaves share files via NFS or another network filesystem it could happen that
133: different clients overwrite their data. Basically there are three points where it occurs:
134: 1. the client is copied
135: 1. the client fetches the worker
136: 1. the worker writes its data to a file.
137: Solutions:
138: 1. a) start and copy clients in serial (very slow) b) copy just one client at time, but start in parallel (fast on NFS, slow otherwise)
139: 1. before fetching the worker the client creates a .lock file. The other clients check the existance and wait for the worker.
140: 1. every worker is started in a separate directory, given by the session id and the ticket number
141: 1.1 Error detection
142: Remote shell command (ssh) termination code:
143: [Input filename]*
144: Content=single-line file content
145: or
146: Content= <<EOT
147: multi-line file content here (ASCII)
148: EOT
149: [Result name]+
150: File=filename
151: {code}
152: * the * behinds the section means there can be _zero_ or more sections
153: * the + behinds the section means there can be _one_ or more sections
154: 1.1 Task completed
155: - Successful: POST \http://master/completed?sessionid=SESSIONID&ticket=TICKET
156: {code:none}
157: [Result name]+
158: Content= single-line file content
159: or
160: Content= <<EOT
161: multi-line file content here (ASCII)
162: EOT
163: {code}
164: - Failed: GET \http://master/failed?sessionid=SESSIONID&ticket=TICKET
165: - Reply Fail due to wrong session id: 403 (Forbidden)
166: - Reply Otherwise: 204 (No Content)
167: * binary content is not supported
168: 1 Implementation Details
169: 1.1 Configuation and Files
170: - Server config: See __server.sample.conf__ in the side bar.
171: - Task file: __tasks.csv__ : csv file, cells seperated with |, parameter names in the headline and every following line contains one parameter set. All lines have to have the same amount of cells like the headline!
172: {code:none}
173: string|counter
174: "eins"|1
175: "zwei"|2
176: {code}
177: - Worker config: See __worker.sample.conf__ in the side bar.
178: - Input specification: The input consists of the command line and one ore more files. In the example above the parameter "counter" is passed as an commandline argument and the parameter "string" is written in the file input.file. This file is used as standart input for the worker. One can specify other files as well, in case the worker reads them.
179: - Result specification: A result has a name and filename where to get the result values from. In the above example one result is calles "Result" and it comes from the file result.file, which is the standart output of the worker. The second result is calles "Output" and is read from output.file. If the worker doesn't write to this file the result will be empty.
180: - Validation: standart implementations are provided and a custom implementation can be provided by the user as a perl function. A validation function gets the result of the worker and returns success or failture. (See Validate.pm)
181: - Collection: standart implementations are provided and a custom implementation can be provided by the user as a perl function. A collection function gets task description (number, parameters set) and the result of the worker and can do whatever it wants with it (usually writes in a file). (See Collect.pm)
182: 1.1 Server
183: - The server is implemented in perl.
184: - Perl has no reasonable way to use shared memory in multiply threads. Since the program is written using perl objects and objects can't be shared I decided to make a __serial__ implementation first.
185: - That means just one request can be responded at the time
186: - Consequences: slow start if there are many slaves; less suitable for very small tasks with large input/response data
187: 1.1 NFS awareness
188: The problem is, that if some slaves share files via NFS or another network filesystem it could happen that
189: different clients overwrite their data. Basically there are three points where it occurs:
190: 1. the client is copied
191: 1. the client fetches the worker
192: 1. the worker writes its data to a file.
193: Solutions:
194: 1. a) start and copy clients in serial (very slow) (current implementation) b) copy just one client at time, but start in parallel (fast on NFS, slow otherwise)
195: 1. before fetching the worker the client creates a .lock file. The other clients check the existance and wait for the worker.
196: 1. every worker is started in a separate directory, given by the session id and the ticket number
197: 1.1 Error detection
198: * No client could be started
199: * Swarm: Remote shell command (ssh) termination code:

Content

Help
For hints about formatting text see snipsnap-help.

Logged in Users: (1)
… and a Guest.

Recently Changed
snipsnap.org | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt