NOTE: this document is last updated for Pipeline version 5.1.0 and is no longer maintained. It is for references only. For the latest documentation on this subject, please refer to our guide on graphic server configuration tool.
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <TempFileLocation>/ifs/tmp/</TempFileLocation> </preferences>Save the file out as "preferences.xml" When you launch the server, it will have the host name "cranium.loni.ucla.edu" and all Temporary files will be in /ifs/tmp directory. Now let's look all the options supported by Pipeline.
<TempFileLocation>/ifs/tmp</TempFileLocation>
Pipeline will create a directory /ifs/tmp/username/timestamp and put all the working files there.
Where username is the user that is running the server and timestamp is the time at which each workflow gets translated before execution. Inside each of those 'timestamp' folders will be all the intermediate files produced by executables from submitted workflows. Depending on the number of users using your server and the kind of work they do, this directory can balloon up very quickly.
This property will be discontinued for versions above 4.5. Instead property SecureTempFileLocation should be used.
<SecureTempFileLocation>/ifs/tmp/SecureTmpDir</SecureTempFileLocation>
Pipeline will create a directory /ifs/tmp/SecureTmpDir with special bits and then upon workflow submission it will create /ifs/tmp/SecureTmpDir/username/timestamp directory and put all the working files there. This directory will have special permissions which will allow files to be accessible only for the user who started the workflow.
IMPORTANT: Starting from version 5.0 it is mandatory to use this option instead of TempFileLocation when UsePrivilegeEscalation is TRUE.
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <TempFileLocation>/ifs/tmp/</TempFileLocation> </preferences>
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <TempFileLocation>/ifs/tmp/</TempFileLocation> <MaximumThreadPoolSize>620</MaximumThreadPoolSize> </preferences>Take note that this will not reject jobs submitted by users after the limit has reached. It will just queue them up until there is an available slot for execution. For grid setups, you should probably have the limit be a little higher than the number of compute nodes available to you, because submitting to the grid takes a non-negligible amount of time, and it's best to keep your compute nodes crunching at all times.
<UsePrivilegeEscalation>
preference to your preferences file with a value of "true" and 2) modify your system's sudoers file to allow the user that runs the Pipeline server to sudo as any user that will be allowed to connect to the system. How to modify the sudoers file is outside of the scope of this guide, but if you want/need this feature you probably already know how to do it. Now your preferences should look something like this:
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <TempFileLocation>/ifs/tmp/</TempFileLocation> <MaximumThreadPoolSize>620</MaximumThreadPoolSize> <UsePrivilegeEscalation>true</UsePrivilegeEscalation> </preferences>
<ServerLibraryLocation>
element in the preferences. By default, the location is set to one of the following locations (based on OS), so you don't need to specify this preference if you're happy with it:
ServerLibraryLocation
directory (and all its subdirectories) and monitors it for changes/additions in any of the files while it runs.
Starting from Pipeline v5.0 there is a new preference ServerLibrarySameDirMonitor
which allows you to specify another monitoring file/directory other than library directory. ServerLibrarySameDirMonitor
is a boolean value preference which by default is set to true. When it is set to false Pipeline will look for alternate monitoring file path which should be declared in ServerLibraryMonitorFile
preference.
For example let's check following preferences
<ServerLibraryLocation>/ifs/lib</ServerLibraryLocation>
<ServerLibrarySameDirMonitor>false</ServerLibrarySameDirMonitor>
<ServerLibraryMonitorFile>/ifs/monitorFile</ServerLibraryMonitorFile>
If we specified ServerLibrarySameDirMonitor
as false then Pipeline server will not update its library when something is changed in ServerLibraryLocation
( /ifs/lib ), it will only update when ServerLibraryMonitorFile
(/ifs/monitorFile) file will be modified.
Put all the module definitions that you want to make available to users into ServerLibraryLocation
directory, and when clients connect they will obtain a copy of the library on their local system. If you add/delete/change any of the definitions in this directory, the server will automatically see the change (no restart required) and synchronize clients again when they reconnect. Even when clients are connected during the change, they will get the new version of ServerLib without reconnection. Remember that changes should be affected on the root directory, otherwise server will not notice the change and Server Library files will not be updated. For example If you have a pipe file somewhere like ServerLib->LONI->Modules->example.pipe and you have a change only in example.pipe file. Although the "Modules" directory and "examples.pipe" will have new "modified time", the ServerLib (which is the root in our case) directory will not change its modification time, so in this case you have to manually change the ServerLib modification time.
After updating server library files go and check the Output Stream of the server, you should see a log like this
Loading server library..........................DONE [1100ms]If this log exists, it means that the server captured the change in the library, otherwise will mean that library has not been updated.
<DaysToPersistStatus>
specifies number of days a workflow can be running. Every 24 hours, the Pipeline server will check and cleanup workflow sessions older than the number of days specified. The default value is 30 days. If a session is cleared, all its temporary files under the temporary directory will be removed.
If <ClearOldTempFilesEnabled>
is set to true, then any temporary session directory that are older than two times the <DaysToPersistStatus>
will be removed. This will not happen under normal circumstances, because persistence database keeps track of all sessions, and no temporary directories older than <DaysToPersistStatus>
should exist. It only applies when Pipeline server restarts with its persistence database manually deleted. The default is false.
<LogFileLocation>
preference. In order to define the prefix in which the log files will be named, simply add that to the end of the directory path. The unique number denoting the log file will be appended onto the file name.
<LogFileLocation>/nethome/users/pipelnv4/server/events.log</LogFileLocation>In the above example, log files will be created in the /nethome/users/pipelnv4/server/ directory, and will be named events.log.0, events.log.1, and so forth.
java -cp ./lib/hsqldb.jar org.hsqldb.Server -database.0 file:/user/foo/mydb -dbname.0 xdb
After successfully starting hsqldb, you can put <PersistenceURL>
to Pipeline server's preference file, something like the following:
<PersistenceURL>jdbc:hsqldb:hsql://localhost/xdb</PersistenceURL>
<HTTPServerPort
> specifies the port number in which the Pipeline server provides API for querying workflow data, including session list, session status, output files. It is helpful when you (or your program) want to query workflows on Pipeline server, without the need of Pipeline client. Please note, once enabled, it does not require any login authorization to see any workflows on the server. By default, this feature is not enabled on the Pipeline server.
For example, we have a preference file like this:
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cerebro-rsn2.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <HTTPServerPort>8021</HTTPServerPort> </preferences>When the server is running, you can go to
http://cerebro-rsn2.loni.ucla.edu:8021/
and it shows an XML file listing all the APIs. Currently there are five functions:
getSessionsList
getSessionWorkflow
getSessionStatus
getInstanceCommand
getOutputFiles
getSessionsList
returns all the active sessions on this Pipeline server. It does not take any argument, and the query URL looks like this:
http://cerebro-rsn2.loni.ucla.edu:8021/getSessionsList
The Pipeline server returns an XML file listing all the active sessions, with their session IDs.
<sessions count="1"> <session> cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492 </session> </sessions>
getSessionWorkflow
returns the workflow file (.pipe file). It takes session ID as argument. The query URL looks like this:
http://cerebro-rsn2.loni.ucla.edu:8021/getSessionWorkflow?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492
getSessionStatus
returns the status of the workflow execution, when it started, if it has finished, what time it finished, what are the nodes and instances in this workflow, and for each node, if they finished successfully. The query URL looks like this:
http://cerebro-rsn2.loni.ucla.edu:8021/getSessionStatus?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492
getInstanceCommand
returns the command of the execution. It takes session ID, node name (which can be found by calling getSessionStatus), and instance number (which can also be found by calling getSessionStatus). The query URL looks like this:
http://cerebro-rsn2.loni.ucla.edu:8021/getInstanceCommand?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492&nodeName=BET_0&instanceNumber=0
getOutputFiles
returns the path of output files generated by the node. It takes session ID, node name, instance number, and parameter ID. The query URL looks like this:
http://cerebro-rsn2.loni.ucla.edu:8021/getOutputFiles?sessionID=cerebro-rsn2.loni.ucla.edu:8020-453da129-c81b-4473-9fc0-8fe03481e492&nodeName=BET_0&instanceNumber=0¶meterID=BET.OutputFile_0
<FailoverEnabled>
indicates that server enabled failover feature. It accepts boolean values true
or false
By default, if this preference does not exist, Pipeline sets it to false.
<FailoverCheckInterval>
specifies the time in milliseconds for Secondary server to ping to Master server. If nothing specified, Pipeline will use default value which is 5000.
<FailoverRetries>
specifies the number of retries before starting secondary server as master when ping fails. If nothing specified, Pipeline will use default value which is 3.
<FailoverAliasInterface>
specifies the name of interface on which Pipeline will create a sub interface to do IP Aliasing. If nothing specified, Pipeline will automatically find the primary network interface and first available sub interface number and will add IP Alias on it. For example if your primary interface is eth0 and eth0:0 and eth0:1 are busy with another IP addresses, Pipeline will use eth0:3.
WARNING: If one of sub-interfaces contains IP Address of specified Hostname, Pipeline will give an error and exit.
<FailoverAliasSubInterfaceNum>
specifies the number of sub interface on which Pipeline should create the Alias IP Address. If nothing specified, Pipeline will automatically find first available sub interface number and will add IP Alias on it. For example if your primary interface is eth0 and eth0:0 and eth0:1 are busy with another IP addresses, Pipeline will use eth0:3.
WARNING: If one of sub-interfaces contains IP Address of specified Hostname, Pipeline will give an error and exit.
<DirAccessControlMode>
is an integer which indicates the access control configuration for running executables and remote file browser. Below is a matrix chart for different mode and their meaning.
Mode | Remote File Browser Access Control | Executables Access Control |
0 | Never | Never |
1 | Never | No with exceptions |
2 | Never | Yes with exceptions |
3 | No with exceptions | No with exceptions |
4 | Yes with exceptions | Yes with exceptions |
5 | Same as Shell permissions | No with exceptions |
6 | Same as Shell permissions | Yes with exception |
7* | Same as Shell permissions | Same as Shell permissions |
* Available starting from Pipeline version 4.2.2
Never means Pipeline server will not do any access control restrictions for any user. Note this will not affect operating system's authentication and access control, in other words, the credentials required to connect to the Pipeline server and the rights required to execute programs will not be affected by the settings here. No with exceptions means access control is not enabled for all users except those marked in Directory Access Control Users will be restricted. Yes with exceptions means all users will be restricted except for those specified in Directory Access Control Users will be allowed. Same as Shell permissions means the remote file browser will act as if user logged in to the server using Shell.<DirAccessControlUsers>
is a list of users seperated by commas (i.e. john,bob,mike) which will indicate conditional users. Depending on the Directory Access Control Mode, These users will be restricted or allowed.
<DirAccessControlPaths>
is a list of directories separated by commas (i.e. /usr/local,/usr/bin), which will be the only directories allowed for restricted users.
<DirAccessControlMode>5</DirAccessControlMode>
<DirAccessControlUsers>john,bob,mike</DirAccessControlUsers>
<DirAccessControlPaths>/usr/local,/usr/bin</DirAccessControlPaths>
Another example, if we want to restrict all users to execute programs only in: /usr/local and /usr/bin, but allow users john, bob, mike to run without restrictions, and let every user browse using remote file browser as Shell does, we would have these configurations:
<DirAccessControlMode>6</DirAccessControlMode>
<DirAccessControlUsers>john,bob,mike</DirAccessControlUsers>
<DirAccessControlPaths>/usr/local,/usr/bin</DirAccessControlPaths>
<GridPluginJARFiles>
which should contain paths to plugin JAR file and the libraries it uses. Paths must be separated by comma. For example if you want to use built in DRMAA or JGDI plugins your prferences file will look like following
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <TempFileLocation>/ifs/tmp/</TempFileLocation> <MaximumThreadPoolSize>620</MaximumThreadPoolSize> <GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/JGDIPlugin.jar, /usr/pipeline/dist/lib/plugins/jgdi.jar</GridPluginJARFiles> </preferences>
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <TempFileLocation>/ifs/tmp/</TempFileLocation> <MaximumThreadPoolSize>620</MaximumThreadPoolSize> <GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/DRMAAPlugin.jar, /usr/pipeline/dist/lib/plugins/drmaa.jar</GridPluginJARFiles> </preferences>IMPORTANT: Some plugins require to be defined in class path. For example DRMAA Plugin requires from you to put the path of drmaa.jar in classPath when starting the server. So to start the server with DRMAA plugin you need to have
$ java -cp .:/usr/pipeline/dist/lib/plugins/drmaa.jar Pipeline.jar server.MainOnly this tag is not enough to have plugins enabled and ready to use, you also need to set tag Grid Plugin Class
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <TempFileLocation>/ifs/tmp/</TempFileLocation> <MaximumThreadPoolSize>620</MaximumThreadPoolSize> <GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/JGDIPlugin.jar, /usr/pipeline/dist/lib/plugins/jgdi.jar</GridPluginJARFiles> <GridPluginClass>jgdiplugin.JGDIPlugin</GridPluginClass> </preferences>
<?xml version="1.0" encoding="UTF-8"?> <preferences> <Hostname>cranium.loni.ucla.edu</Hostname> <ServerPort>8020</ServerPort> <TempFileLocation>/ifs/tmp/</TempFileLocation> <MaximumThreadPoolSize>620</MaximumThreadPoolSize> <GridPluginJARFiles>/usr/pipeline/dist/lib/plugins/DRMAAPlugin.jar, /usr/pipeline/dist/lib/plugins/drmaa.jar</GridPluginJARFiles> </preferences> <GridPluginClass>drmaaplugin.DRMAAPlugin</GridPluginClass> </preferences>
<GridComplexResourceAttributes>pipeline, serverId=server1</GridComplexResourceAttributes>Following defines two attributes 1) pipeline which is equal to TRUE and 2) serverId which is equal to server1. This tag is just a definition of complex attributes. In order to use them you have to define
_pcomplex
in Grid engine native specifications. In our case, the _pcomplex
will be replaced with -l pipeline -l serverId=server1
when submitting the job to the grid.
Note that the Grid manager has to be configured properly to accept jobs with given resource attributes.
<GridMaxSubmitThreads>10</GridMaxSubmitThreads>The example will allow maximum of 10 parallel submissions at a time.
<GridEngineNativeSpecification>
. On the LONI Pipeline server we use the following native spec. preference:
<GridEngineNativeSpecification>-shell y -S /bin/csh -q pipeline.q -l pipeline -N _pjob </GridEngineNativeSpecification>
By default, Grid Plugins are disabled, you must set Grid Plugin JAR Files and Grid Plugin Class if you want the Pipeline server to use your grid engine. The native spec you should use for your installation will vary, but if you're using an Oracle Grid Engine (previously known as Sun Grid Engine) installation and you want to use the same string, you'll want to change the -q pipeline.q
to reflect the submission queue (if any) that you will be using.
Optionally, you can add _pmem
and _pstack
to the GridEngineNativeSpecification
tag. _pmem
enables user define maximum memory per module, and _pstack
enables user to define the stack size. Both of these can be configured by the user using the latest Pipeline client, and they all use the default set by the grid engine unless user specifies.
Starting from version 4.4 if you want to use Grid Complex Resource Attributes you can also add _pcomplex
which refers to tag Grid Complex Resource Attributes.
<GridTotalSlots>
specifies number of total grid slots for the cluster. This enables connected user to see how busy is the server in terms of number of running jobs in grid and number of total slots available.
Alternatively, you can use the <GridTotalSlotsCmd>
tag which contains a command line query to get the total number of available slots for the queue. Refer to your cluster management documentation for the appropriate query. By using this tag, the server will query the grid engine periodically to get the latest number of available slots, and update the number automatically, and broadcast the new number to clients.
<GridJobAccountingURL>jdbc:mysql://hostname/db_name</GridJobAccountingURL>hostname is the addres of the host where the ARCo database is running ( i.e. arco.loni.ucla.edu ) db_name is the name of database ( i.e. cranium_db )
<GridJobAccountingUsername>username</GridJobAccountingUsername>
<GridJobAccountingUsername>
.
<GridJobAccountingPassword>password</GridJobAccountingPassword>Note that this password is stored as a clear text in preferences.xml, which is not secure. It is recommended to restrict access to preferences file for other users.
Array Job # | Number of instances | Total submitted so far |
1 | 50 | 50 |
2 | 100 | 150 |
3 | 200 | 350 |
4 | 400 | 750 |
5 | 250 | 1000 |
Array Job # | Number of instances | Total submitted so far/ Remaining |
1 | 50 | 50 / 718 |
2 | 100 | 150 / 618 |
3 | 200 | 350 / 418 |
4 | 418 | 768 / 0 |