 |
The data on killings in Kosovo are in four files. All of the files are
comma-delimited ASCII. The fields in each file are described below.
The first file is md_pub.csv. It contains 4725
records (see below for why there are more records than victims). Appendix
1 of the report
gives a full description of how this file was compiled. We have omitted
the names of the victims in order to protect both the victims' privacy and
to protect the people who gave information to the organizations that collected
the data. It contains records of deaths reported to have occurred during
the period 20 March 1999 - 20 June 1999; reported deaths outside that period
were not included in our analysis and so are not included in this file.
Each record represents one death or a partial death. The partial deaths
are those for which the date of death was missing. Quoting from pages 30-31
of the report, "For 204 records with no date information, a hot deck procedure
was employed to assign a date at random from a donor record that was geographically
closest to the location of the record with the missing date.Three dates were
randomly selected from the potential donors, and copies of the original record
were created with each of the sampled dates. The new records were each assigned
a weight of 0.33."
Note therefore that the total number of victims is the sum of the "weight" field, which equals 4399.67.
However, not all weighted deaths have three records with the same id.
Continuing to quote from page 31, "Some of the hot-decked dates were outside
the date range of interest to this study (20 March-22 June). Those records
(and their partial weights) were therefore excluded from the analysis."
It contains the following fields.
| Field name |
Field description
|
id
|
The id of this record. Note that these are not
unique.
|
age
|
The age at death of this victim. Note that 0 denotes
an infant, and -1 indicates that the age is unknown.
|
sex
|
M=male, F=female, U=unknown.
|
pcode
|
The geographic code for the village or town in
which the death occurred. See the geographic dataset for more information.
|
mcode
|
The geographic code for the municipality in which
the death occurred. See the geographic dataset for more information.
|
dt_kill
|
The date of the death.
|
dtk2
|
The date of death rounded to two-day periods; note
that each period includes the following day.
|
aba
|
1=this death was reported to the ABA (see pages
18-19 in the report).
|
exh
|
1=this death was identified in an exhumation (see
pages 19-20 in the report).
|
hrw
|
1=this death was reported to HRW (see pages 20-21
in the report).
|
osce
|
1=this death was reported to the OSCE (see page
21 in the report).
|
weight
|
1=record with a complete date; 0.33=record with
an imputed date.
|
The remaining three files contain our estimates, using the data in md_pub.csv
and following the procedures described in Appendix 2 of our report.
Over time
dtk2_oth.csv contains data estimated by two-day
periods. These data underlie (for example) Figure 2 (page 6), and the
regressions over time presented in Figure 19 (page 58), first and third columns.
Note: these data have been corrected as described in the 15 November 2002 corrigendum.
Field name
|
Field description
|
dtk2
|
as above.
|
modelspec
|
The model used to estimate the total deaths for
this point in standard log-linear notation. See Appendix 2, section 3.5 and
following. This value is empty when it was impossible to estimate any model
for this period (e.g., 11may99).
|
nsum
|
The total estimated deaths for this two-day period.
Note that this value is simply the reported deaths when modelspec is missing.
The cell counts from which this was estimated can be computed using the
raw data.
|
sd
|
The estimated standard error of the estimate of
nsum, as described in Appendix 2, page 40, in the report.
|
lvcnt
|
The estimated total number of people leaving home
during this two-day period. See the description of migration data, and Policy or Panic.
|
bomb
|
The number of NATO airstrikes in this period. See
the description of other data, and
Section 5, pp. 8-13 in the report.
|
bomblag
|
The number of reported NATO airstrikes in the previous
period (note that this is missing for 20mar99).
|
klaB
|
The number of reported KLA exchanges of fire with
Serb authorities. See the description of other
data, and pp. 11-12 in the report.
|
klaBlag
|
The value of klaB in the previous two-day period.
|
klaK
|
The number of reported Serb casualties caused by
interactions with the KLA. See See the description of other
data, and pp. 11-12 in the report.
|
klaKlag
|
The value of klaK in the previous two-day period.
|
Over region and six-day period
rgwk6_oth.csv contains data estimated by
two-day periods. These data underlie (for example) Figure 12 (page 53).
Note: these data have been corrected as described in the 15 November
2002 corrigendum.
Over region and two-day period
rgdtk2est_oth.csv contains data estimated
by region and two-day periods. These data underlie (for example) Figures
4-7 (page 9-10), and the regressions over time presented in Figure 19 (page
58), second and fourth columns. Note: these data have been corrected
as described in the 15 November
2002 corrigendum.
Last updated: 1 November 2002 11:30 PB
|  |