Informatik, Modellbau und Privates von Georg
[ start | index | login ]

Changes of Simple Parallel Exec from #2 to #3

Changed lines at line 11
11: - SSH (public key authorisation)
12: 1.1 Features & Configurability
13: - list of computer names or IPs
14: - commandline pattern with space holders for variables
15: - validation (return value and/or existance of a file)
16: - list with all parameter settings, which should be processed
17: - simple task assignment: if a computer is ready with one task give him the next from the list
18: - error detection (see next section)
19: - in case of an error while executing one task give it to the next machine and mark that machine as dead
20: - collecting rules: a) stdout processing (plain concat ; blockwise with parameters) b) file processing (plain concat, with parameters)
21: 1.1 Error detection
22: The folling errors need to be detected:
23: - error while connection/authentication
24: - machine dead (for whatever reason)
25: - connection broken
26: - programm terminated without success
27: 1 Implementation Details
28: 1.1 Configuation and Files
29: - Cluster file: list of computers, one per line
30: - Parameter file: csv file, cells seperated with |, parameter names in the headline and every following line contains one parameter set. All lines have to have the same amount of cells like the headline!
31: - Commandline pattern: a perl syntax String with parameter Variables for the parameters
32: - Validation function: gets the parameters set and the result of the programm and returns success or not
33: - Collection function: gets the parameters set and the result of the programm and can do whatever with it (usually print it in a file)
34: 1.1 Error detection
35: Remote shell command (ssh) termination code:
36: - 0 => Success: The program has been executed with success!
37: - otherswise => Failure: Can have the following reasons: connection failed, programm not found or terminated without success.
38: To check the connection and authentication run:
39: {code:shell}ssh host echo{code}
40: - return 0 (success): Connection is OK and machine lives. The program was either not found or it didn't terminate with 0. Either case we assume that this parameter set is somehow bad and skip it.
41: - otherwise mark this machine as dead and reschedule the parameter set.
42: - SSH (public key authentication)
43: - permanent network connection
44: 1.1 Features & Configurability
45: - one master computer
46: - list of slave computers (names or IPs)
47: - commandline pattern with space holders for variables
48: - validation (return value and/or existance of a file)
49: - list with all parameter settings, which should be processed
50: - simple task assignment: if a computer is ready with one task give him the next from the list
51: - error detection (see next section)
52: - in case of an error while executing one task give it to the next machine and mark that machine as dead
53: - collecting rules: a) stdout processing (plain concat ; blockwise with parameters) b) file processing (plain concat, with parameters)
54: 1.1 Error detection
55: The folling errors need to be detected:
56: - error while connection/authentication
57: - machine dead (for whatever reason)
58: - connection broken
59: - programm terminated without success
60: 1 Implementation Details
61: 1.1 Configuation and Files
62: - Cluster file: list of computers, one per line
63: - Parameter file: csv file, cells seperated with |, parameter names in the headline and every following line contains one parameter set. All lines have to have the same amount of cells like the headline!
64: - Commandline pattern: a perl syntax String with parameter Variables for the parameters
65: - Validation function: gets the parameters set and the result of the programm and returns success or not
66: - Collection function: gets the parameters set and the result of the programm and can do whatever with it (usually print it in a file)
67: 1.1 Error detection
68: Remote shell command (ssh) termination code:
69: - 0 => Success: The program has been executed with success!
70: - otherswise => Failure: Can have the following reasons: connection failed, programm not found or terminated without success.
71: To check the connection and authentication run:
72: {code:shell}ssh host echo{code}
73: - return 0 (success): Connection is OK and machine lives. The program was either not found or it didn't terminate with 0. Either case we assume that this parameter set is somehow bad and skip it.
74: - otherwise mark this machine as dead and reschedule the parameter set.
75: One case that is not covered by the above procedure is if the connection breaks. Then the ssh command is just not terminating.
76: Solution: asyncron periodic ping

Content

Help
For hints about formatting text see snipsnap-help.

Logged in Users: (1)
… and a Guest.

Recently Changed
snipsnap.org | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt