3.6 SPSS Syntax File Specifications


Created and tested using SPSS 13.0 for Windows

SPSS Data Definitions cover 10 main attributes for any variable: Name, Type, Width, Decimals, Label, Values, Missing, Columns, Align, and Measure.

SPSS is able to read most any ASCII file and deduce parameters for some of these variable attributes. However, any other attributes must be typed in by hand, which is tedious for large datasets.

OpenClinica instead can generate an SPSS Syntax file (*.sps) that, in association with the data file, will automatically load in the data with the proper variable definitions/attributes. OpenClinica currently supports automated definition of Name, Type, Width, Decimals, Label, Values, Missing, Columns, Align, and Measure

This document describes the structure and syntax of the .sps file.

Conceptual Mapping

The conceptual mapping of OpenClinica data element metadata to SPSS Data Definitions is as follows:

 

SPSS Data Definition Metadata           OpenClinica CRF Metadata
Name ITEM_NAME
Type Mapped to DATA_TYPES
Width Calculated from widest value in field
Decimals If DATA_TYPES = Real, then calculated from most precise value in field. Else 0.
Label DESCRIPTION_LABEL
Values Generated from RESPONSE_OPTIONS_TEXT and RESPONSE_OPTIONS_VALUES
Missing N/A
Columns N/A
Align N/A
Measure N/A

 

Mapping between SPSS ‘Type’ and OpenClinica DATA_TYPES 

SPSS Types SPSS Syntax for Type Format CRF DATA_TYPE

Numeric F BL (Boolean), BN (BooleanNotNull), INT, Real, SET (if RESPONSE_OPTIONS_VALUES all numeric)
Comma    
Dot    
Scientific Notation    
Date   DATE
Dollar    
Custom Currency    
String A ED (URL), ST (string), SET (if RESPONSE_OPTIONS_VALUES not all numeric)

  

Mapping between SPSS ‘Values’ and OpenClinica RESPONSE_OPTIONS 

The VALUE LABELS in the SPSS Syntax file maps OpenClinica RESPONSE_OPTIONS to discrete value sets in SPSS. Only variables that have a valid RESPONSE_LABEL should show up under the VALUE LABELS section. 

Syntax

---------------------------

VALUE LABELS

          VARNAME1

            RESPONSE_OPTIONS_VALUE[0] "RESPONSE_OPTIONS_TEXT[0]"

            RESPONSE_VALUES[1] "RESPONSE_OPTIONS_TEXT[1]"

            RESPONSE_VALUES[2] "RESPONSE_OPTIONS_TEXT[2]" /

          VARNAME2

            RESPONSE_OPTIONS_VALUE[0] “RESPONSE_OPTIONS_TEXT[0]“

            RESPONSE_VALUES[1] “RESPONSE_OPTIONS_TEXT[1]“

            RESPONSE_VALUES[2] “RESPONSE_OPTIONS_TEXT[2]“ /

          .

---------------------------

Values for built-in system fields 

Subject Attributes

  • Date of Birth
Field                    Value Encoding
Name DOB DOB
Type Date ????
Width    
Decimals 0 0
Label Date of Birth Date of Birth
Values None
Missing None  
Columns    
Align    
Measure    
  • Gender
Field                 Value Encoding
Name Gender Gender
Type String A
Width 1 1
Decimals 0 0
Label Gender Gender
Values M, F Gender

M “Male”

F “Female” /

Missing None  
Columns    
Align    
Measure    

 

Event Attributes

  • Event Location
Field                    Value Encoding
Name LOCATION_[ EVENT HANDLE]          LOCATION_[ EVENT HANDLE]
Type String A
Width    
Decimals 0 0
Label Location for for Event ‘[EVENT NAME]’ (EVENT HANDLE) Location for for Event ‘[EVENT NAME]’ (EVENT HANDLE)
Values None  
Missing None  
Columns    
Align    
Measure    
  • Start Date
Field Value Encoding
Name STARTDATE_[ EVENT HANDLE] STARTDATE_[ EVENT HANDLE]
Type Date ????
Width    
Decimals 0 0
Label Start Date for Event ‘[EVENT NAME]’ (EVENT HANDLE) Start Date for Event ‘[EVENT NAME]’ (EVENT HANDLE)
Values None  
Missing None  
Columns    
Align    
Measure    
  • End Date
Field                  Value Encoding
Name EndDate_[ EVENT HANDLE] EndDate_[ EVENT HANDLE]
Type Date ????
Width    
Decimals 0 0
Label End Date for Event ‘[EVENT NAME]’ (EVENT HANDLE) End Date for Event ‘[EVENT NAME]’ (EVENT HANDLE)
Values None  
Missing None  
Columns    
Align    
Measure    

Variable Naming
The following rules apply to variable names:

The name must begin with a letter. The remaining characters can be any letter, any digit, a period, or the symbols @, #, _, or $.
Variable names cannot end with a period.
Variable names that end with an underscore should be avoided (to avoid conflict with variables automatically created by some procedures).
The length of the name cannot exceed 64 bytes. Sixty-four bytes typically means 64 characters in single-byte languages (for example, English, French, German, Spanish, Italian, Hebrew, Russian, Greek, Arabic, Thai) and 32 characters in double-byte languages (for example, Japanese, Chinese, Korean).
Blanks and special characters (for example, !, ?, ', and *) cannot be used.
Each variable name must be unique; duplication is not allowed.
Reserved keywords cannot be used as variable names. Reserved keywords are: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH.
Variable names can be defined with any mixture of upper- and lowercase characters, and case is preserved for display purposes.
When long variable names need to wrap onto multiple lines in output, SPSS attempts to break the lines at underscores, periods, and at changes from lower case to upper case


Rules for automatically converting an invalid SPSS variable name to valid SPSS variable name:

- Replace any invalid character with the symbol #
- If the first character is not a letter, letter V will be used as first letter
- If the last character is a period or underscore, it will replaced by #
- If a name is longer than 64 characters, it will be truncated to 64 characters.
- If it results in non-unique name in a data file, sequential numbers are used to replace its letters at the end. By default, the size of sequential numbers is 3.
- If a reserved keyword has been used as a variable name, squential numbers are apended to its end.

Syntax
The syntax file uses the GET DATA command. The formal syntax (as taken from SPSS help documentation) is as follows:

------------------------------------------
GET DATA Command Syntax
GET DATA
/TYPE = {ODBC}
{XLS }
{TXT }
/FILE = 'filename'
Subcommands for TYPE = ODBC
/CONNECT='connection string'
/UNENCRYPTED
/SQL 'any select statement'
['select statement continued']
/ASSUMEDSTRWIDTH={255**}
{n }
Subcommands for TYPE = XLS
[/SHEET = {INDEX**} {sheet number}]
{NAME } {'sheet name'}
[/CELLRANGE = {RANGE } {'start point:end point'}]
{FULL**}
[/READNAMES = {on** }]
{off }
Subcommands for TYPE = TXT
[/ARRANGEMENT = {FIXED }]
{DELIMITED**}
[/FIRSTCASE = {n}]
[/DELCASE = {LINE** }]1
{VARIABLES n}
[/FIXCASE = n]2
[/IMPORTCASE = {ALL** }]
{FIRST n }
{PERCENT n}
[/DELIMITERS = {'delimiters'}]
[/QUALIFIER = 'qualifier']

VARIABLES subcommand for ARRANGEMENT = DELIMITED
/VARIABLES = varname format varname format...
VARIABLES subcommand for ARRANGEMENT = FIXED
/VARIABLES [/rec#] varname startcol-endcol format
[/rec#] varname startcol-endcol format...

Note: For text data files, the first column is column 0, not column 1. This is different from DATA LIST, where the first column is column 1.
------------------------------------------

VARIABLE LABELS
V1 "Subject Unique ID"


VALUE LABELS
V11_U24_A1
2 "ab"
128 "gh"
254 "xt"
380 "ff" /

References

[1] http://www.icpsr.umich.edu/NACJD/HELP/faq-nacjd.html

[2] http://www.hmdc.harvard.edu/pub_files/SPSS_Syntax.pdf

 

Exit Help