CIIDH Data
AAAS/CIIDH database data dictionary
Version date: 2000.01.29
Current version: ATV20.1
Patrick Ball & Herbert F. Spirer
The unit of analysis for each record in this structure is VIOLATION.
Each violation was of a particular type, happened at a particular time and
place, and was committed by zero, one, or several organizational perpetrators.
The violation was committed against zero or one named (individually identified)
victim, and zero or more anonymous (unidentified) additional victims. The violation
was reported one or more times in one, two, or three source types.
Note that to count the number of times individuals suffered particular violations,
users should sum either the variable c_nmd (to count the number of NAMED
individuals) or c_tot (to count the total number of individuals, named
and anonymous).* In Stata, this
can be accomplished by using frequency weights. Other statistics programs have
similar features. To repeat: the number of records is not the same as the number
of violations.
The dataset is available in several formats: Stata version 6 (recommended),
delimited ASCII (csv), dBase III (dbf), SPSS portable file, and SPSS for Windows.
Note that for the Stata and SPSS (Windows and portable) versions of the dataset,
the variable labels and value labels are already applied to the data. However,
for the ASCII and dbf versions, you will have to handle the labeling on your
own. Note that there are 17,423 records in this dataset, which is too large
to be imported into most spreadsheets.
The categorical variables are coded as integers. Although this is convenient
for statistical packages, it can be difficult for human beings to interpret
data coded in this way. The value labels for the integer codes are here.
The value label list includes the number of times each category appears in the
data. Note: these are frequencies of records, not of violations. To count violations,
you must use the weights in c_tot and c_nmd.
Variable list
| Victim variables |
| Variable name |
Variable type |
Value labels |
Variable label |
| v_num |
str9 |
|
Victim ID |
| v_sur1 |
str13 |
|
Victim First surname |
| v_sur2 |
str15 |
|
Victim Second surname |
| v_nam1 |
str13 |
|
Victim First names |
| v_age |
byte |
|
Victim Age |
| v_dob |
int |
|
Victim date of birth |
| v_p94 |
long |
|
Population of v_must
(1994 census) |
| v_occ |
byte |
Yes |
Victim Occupation |
| v_ind |
byte |
Yes |
Victim Ethnic category |
| v_sex |
byte |
Yes |
Victim Sex |
| v_eth |
byte |
Yes |
Victim Maternal language
(proxy for eth.) |
| v_must |
int |
Yes |
Victim Muncipio of
birth |
|
| Violation
variables |
| Variable name |
Variable type |
Value labels |
Variable label |
| n_grp |
int |
|
Number in group (killings
and disappearances) |
| n_ovkl |
byte |
|
Whether the killing
was "overkill" (see Note 2 below) |
| n_mon |
byte |
|
Month of violation |
| n_year |
int |
|
Year of violation |
| n_dtcd |
byte |
Yes |
Date precision (violation) |
| n_rgim |
byte |
Yes |
Regime code (for date
of violation) |
| n_p94 |
long |
|
Population of m_mucd
(1994 census) |
| n_type |
byte |
Yes |
Type of violation (note
1, below) |
| n_ur |
byte |
Yes |
Violation location:
Rural or urban |
| n_must |
int |
Yes |
Municipio of the violation |
| n_dpst |
int |
Yes |
Departamento of the
violation |
|
| Perpetrator
variables |
| Variable name |
Variable type |
Value labels |
Variable label |
| p_civ |
byte |
|
1=participation of
civilians |
| p_arm |
byte |
|
1=participation of
army |
| p_pac |
byte |
|
1=participation of
PACs |
| p_pol |
byte |
|
1=participation of
police |
| p_par |
byte |
|
1=participation of
paramilitary groups |
| p_urn |
byte |
|
1=participation of
URNG |
|
| Reporting
variables |
| Variable name |
Variable type |
Value labels |
Variable label |
| r_per |
byte |
|
Number of times this
violation was reported in the press |
| r_doc |
byte |
|
Number of times this
violation was reported in documentary sources |
| r_ent |
byte |
|
Number of times this
violation was reported in interviews with witnesses |
| r_date |
int |
|
If R_per>0, R_date
is the date of the first press report of the violation (in the ASCII version,
this is formatted as mm/dd/yyyy) |
|
| Case (multiplier)
variables |
| Variable name |
Variable type |
Value labels |
Variable label |
| c_nmd |
Byte |
|
1=this violation includes
a named victim |
| c_tot |
Int |
|
The total number of
victims (named and anonymous) who suffered this violation |
Note 1: the violation type codes are the following:
| Category |
Meaning |
Record count |
| DM |
Disappeared, later
found killed |
218 |
| Ds |
Disappeared |
1546 |
| Hr |
Injured (in Army attack) |
411 |
| Mu |
Killed |
11862 |
| Se |
Kidnapped |
2903 |
| To |
Tortured |
483 |
| Total |
17423 |
The important part of Note 1 is that to count disappeared people, you must
sum c_nmd or c_tot including Ds + DM; to count killed people, sum c_nmd or c_tot
with DM + Mu; to count killed and disappeared, sum c_nmd or c_tot for Ds + DM
+ Mu. DM is a compound category including people who were both disappeared and
later their bodies appeared. In Stata, you could create new variables to represent
people who were killed and disappeared with the following commands. (note the
difference between the record counts in the table above and the frequency counts
using c_tot in the examples below).
/* this creates a variable with the value and the label in one field */
. vallab n_type, g(sn_type)
/* now show the tabulation, counting anonymous victims */
. ta sn_type [fw=c_tot]
Type of |
violation | Freq. Percent Cum.
------------+-----------------------------------
23 DM | 272 0.63 0.63
24 Ds | 2759 6.41 7.04
25 Hr | 1085 2.52 9.56
26 Mu | 34210 79.43 88.99
27 Se | 3466 8.05 97.03
28 To | 1278 2.97 100.00
------------+-----------------------------------
Total | 43070 100.00
/*
we're interested in violations with n_type = 23, 24, and 26.
The new variable is created below.
*/
. ge killdis=1 if n_type==23 | n_type==24 | n_type==26
(3797 missing values generated)
. replace killdis=0 if killdis==.
(3797 real changes made)
. ta killdis [fw=c_tot]
killdis | Freq. Percent Cum.
------------+-----------------------------------
0 | 5829 13.53 13.53
1 | 37241 86.47 100.00
------------+-----------------------------------
Total | 43070 100.00
Note 2: "overkill" is defined as people who were killed by methods beyond
the necessary, including torturing to death or burning, as well as cases in
which bodies were mutilated after death.
Notes on the original data
The original data from which this dataset was generated include 19 tables linked
in a relational database collected and systematized by the International Center
for Human Rights Research in Guatemala. That full dataset, including narrative
summaries, occupies approximately 50 megabytes.
There are many variables that were not included in this output, from antemortem
information about victims of forced disappearance (color of pants when last
seen, dental or bone conditions), to specific types of torture, to data about
the perpetrators (vehicle type, weapon caliber).
It would be very complicated to put most of the excluded variables in the
dataset. For example, since each violation may have been committed by various
perpetrators, there may be various weapons that were used. If we attempt to
put the weapons data into the flat structure we are using for this published
data, we will need dozens of fields to represent each perpetrator's possible
weapon.
Most of the variables not included in this dataset are sparse. For example,
there is data on the type of weapons used in particular violations for approximately
one-third of the violations originally coded. Other variables have non-missing
data for only a few dozen records. If researchers have particular questions
about variables they would like to have included in future versions of this
dataset, we are willing to discuss their needs. If there are sufficient requests
for new variables, we may issue a new version of this dataset. A review of the
dataset's full variables is here.
Error checking
We have devoted hundreds of hours to checking the dataset to control for multiple
reports of the same incidents. Many of the victims in this dataset have the
same names and may appear to be the same person. We have reviewed every pair
of victims with the same or similar names against the narrative information
that was stored with the original data. The narrative information includes portions
of the original testimony, quotations of original newspaper or documentary accounts,
and the coders' commentary on what they found in the source materials; this
narrative information cannot be published because it includes too much data
on the original witnesses to be securely released. Whenever victims appeared
to be the same person, based on an overall analysis of the names, places and
dates of birth, types, times and places of the violations, and qualitative data
in the narrative, we combined the records. Note that we did not delete the original
records; instead we created meta-records that linked all the data pertaining
to the same person. This way we are able to report the r_* series variables,
analyzing how frequently some violations are reported relative to other violations.
* See State
Violence,
chapter 10, for a discussion of named and anonymous victims. Return
to Text
|