3.6 SPSS Syntax File Specifications
Created and tested using SPSS 13.0 for Windows
SPSS Data Definitions cover 10 main attributes for any variable:
Name, Type, Width, Decimals, Label, Values, Missing, Columns,
Align, and Measure.
SPSS is able to read most any ASCII file and deduce parameters
for some of these variable attributes. However, any other
attributes must be typed in by hand, which is tedious for large
datasets.
OpenClinica instead can generate an SPSS Syntax file (*.sps)
that, in association with the data file, will automatically load
in the data with the proper variable definitions/attributes.
OpenClinica currently supports automated definition of Name,
Type, Width, Decimals, Label, Values, Missing, Columns, Align, and Measure
This document describes the structure and syntax of the .sps
file.
Conceptual Mapping
The conceptual mapping of OpenClinica data element metadata to
SPSS Data Definitions is as follows:
SPSS Data Definition
Metadata
|
OpenClinica CRF Metadata |
Name |
ITEM_NAME |
Type |
Mapped to DATA_TYPES |
Width |
Calculated from widest value
in field |
Decimals |
If DATA_TYPES = Real, then
calculated from most precise value in field. Else 0. |
Label |
DESCRIPTION_LABEL |
Values |
Generated from
RESPONSE_OPTIONS_TEXT and RESPONSE_OPTIONS_VALUES |
Missing |
N/A |
Columns |
N/A |
Align |
N/A |
Measure |
N/A |
Mapping between SPSS Type and OpenClinica
DATA_TYPES
SPSS
Types |
SPSS
Syntax for Type Format |
CRF DATA_TYPE
|
Numeric |
F |
BL
(Boolean), BN (BooleanNotNull), INT, Real, SET (if
RESPONSE_OPTIONS_VALUES all numeric) |
Comma |
|
|
Dot |
|
|
Scientific
Notation |
|
|
Date |
|
DATE |
Dollar |
|
|
Custom
Currency |
|
|
String |
A |
ED
(URL), ST (string), SET (if RESPONSE_OPTIONS_VALUES not
all numeric) |
Mapping between SPSS Values and
OpenClinica RESPONSE_OPTIONS
The VALUE LABELS in the SPSS Syntax file
maps OpenClinica RESPONSE_OPTIONS to discrete value sets in SPSS.
Only variables that have a valid RESPONSE_LABEL should show up
under the VALUE LABELS section.
Syntax
---------------------------
VALUE LABELS
VARNAME1
RESPONSE_OPTIONS_VALUE[0] "RESPONSE_OPTIONS_TEXT[0]"
RESPONSE_VALUES[1] "RESPONSE_OPTIONS_TEXT[1]"
RESPONSE_VALUES[2] "RESPONSE_OPTIONS_TEXT[2]" /
VARNAME2
RESPONSE_OPTIONS_VALUE[0] RESPONSE_OPTIONS_TEXT[0]
RESPONSE_VALUES[1] RESPONSE_OPTIONS_TEXT[1]
RESPONSE_VALUES[2] RESPONSE_OPTIONS_TEXT[2] /
.
---------------------------
Values for built-in system fields
Subject Attributes
Field
|
Value |
Encoding |
Name |
DOB |
DOB |
Type |
Date |
???? |
Width |
|
|
Decimals |
0 |
0 |
Label |
Date
of Birth |
Date
of Birth |
Values |
None |
Missing |
None |
|
Columns |
|
|
Align |
|
|
Measure |
|
|
Field
|
Value |
Encoding |
Name |
Gender |
Gender |
Type |
String |
A |
Width |
1 |
1 |
Decimals |
0 |
0 |
Label |
Gender |
Gender |
Values |
M,
F |
Gender M Male
F Female /
|
Missing |
None |
|
Columns |
|
|
Align |
|
|
Measure |
|
|
Event Attributes
Field
|
Value |
Encoding |
Name |
LOCATION_[
EVENT
HANDLE]
|
LOCATION_[
EVENT HANDLE] |
Type |
String |
A |
Width |
|
|
Decimals |
0 |
0 |
Label |
Location
for for Event [EVENT NAME] (EVENT HANDLE) |
Location
for for Event [EVENT NAME] (EVENT HANDLE) |
Values |
None |
|
Missing |
None |
|
Columns |
|
|
Align |
|
|
Measure |
|
|
Field |
Value |
Encoding |
Name |
STARTDATE_[
EVENT HANDLE] |
STARTDATE_[
EVENT HANDLE] |
Type |
Date |
???? |
Width |
|
|
Decimals |
0 |
0 |
Label |
Start
Date for Event [EVENT NAME] (EVENT HANDLE) |
Start
Date for Event [EVENT NAME] (EVENT HANDLE) |
Values |
None |
|
Missing |
None |
|
Columns |
|
|
Align |
|
|
Measure |
|
|
Field
|
Value |
Encoding |
Name |
EndDate_[
EVENT HANDLE] |
EndDate_[
EVENT HANDLE] |
Type |
Date |
???? |
Width |
|
|
Decimals |
0 |
0 |
Label |
End
Date for Event [EVENT NAME] (EVENT HANDLE) |
End
Date for Event [EVENT NAME] (EVENT HANDLE) |
Values |
None |
|
Missing |
None |
|
Columns |
|
|
Align |
|
|
Measure |
|
|
Variable Naming
The following rules apply to variable names:
The name must begin with a letter. The remaining characters can
be any letter, any digit, a period, or the symbols @, #, _, or $.
Variable names cannot end with a period.
Variable names that end with an underscore should be avoided (to
avoid conflict with variables automatically created by some
procedures).
The length of the name cannot exceed 64 bytes. Sixty-four bytes
typically means 64 characters in single-byte languages (for
example, English, French, German, Spanish, Italian, Hebrew,
Russian, Greek, Arabic, Thai) and 32 characters in double-byte
languages (for example, Japanese, Chinese, Korean).
Blanks and special characters (for example, !, ?, ', and *)
cannot be used.
Each variable name must be unique; duplication is not allowed.
Reserved keywords cannot be used as variable names. Reserved
keywords are: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO,
WITH.
Variable names can be defined with any mixture of upper- and
lowercase characters, and case is preserved for display purposes.
When long variable names need to wrap onto multiple lines in
output, SPSS attempts to break the lines at underscores, periods,
and at changes from lower case to upper case
Rules for automatically converting an invalid SPSS variable name to valid SPSS variable name:
- Replace any invalid character with the symbol #
- If the first character is not a letter, letter V will be used as first letter
- If the last character is a period or underscore, it will replaced by #
- If a name is longer than 64 characters, it will be truncated to 64 characters.
- If it results in non-unique name in a data file, sequential numbers are
used to replace its letters at the end. By default, the size of
sequential numbers is 3.
- If a reserved keyword has been used as a variable name, squential numbers are apended to its end.
Syntax
The syntax file uses the GET DATA command. The formal
syntax (as taken from SPSS help documentation) is as follows:
------------------------------------------
GET DATA Command Syntax
GET DATA
/TYPE = {ODBC}
{XLS }
{TXT }
/FILE = 'filename'
Subcommands for TYPE = ODBC
/CONNECT='connection string'
/UNENCRYPTED
/SQL 'any select statement'
['select statement continued']
/ASSUMEDSTRWIDTH={255**}
{n }
Subcommands for TYPE = XLS
[/SHEET = {INDEX**} {sheet number}]
{NAME } {'sheet name'}
[/CELLRANGE = {RANGE } {'start point:end point'}]
{FULL**}
[/READNAMES = {on** }]
{off }
Subcommands for TYPE = TXT
[/ARRANGEMENT = {FIXED }]
{DELIMITED**}
[/FIRSTCASE = {n}]
[/DELCASE = {LINE** }]1
{VARIABLES n}
[/FIXCASE = n]2
[/IMPORTCASE = {ALL** }]
{FIRST n }
{PERCENT n}
[/DELIMITERS = {'delimiters'}]
[/QUALIFIER = 'qualifier']
VARIABLES subcommand for ARRANGEMENT = DELIMITED
/VARIABLES = varname format varname format...
VARIABLES subcommand for ARRANGEMENT = FIXED
/VARIABLES [/rec#] varname startcol-endcol format
[/rec#] varname startcol-endcol format...
Note: For text data files, the first column is column 0, not
column 1. This is different from DATA LIST, where the first
column is column 1.
------------------------------------------
VARIABLE LABELS
V1 "Subject Unique ID"
VALUE LABELS
V11_U24_A1
2 "ab"
128 "gh"
254 "xt"
380 "ff" /
References
[1] http://www.icpsr.umich.edu/NACJD/HELP/faq-nacjd.html
[2] http://www.hmdc.harvard.edu/pub_files/SPSS_Syntax.pdf
|