no title and rdflinks found!

drawsample

Langdoc

samplesize

classification to use
more options

include the following document types
bibliographical
comparative
dictionary
grammar
minimal
overview
sociolinguistic
socling
text
wordlist
bibliography
dialectology
ethnographic
grammar_sketch
new_testament
phonology
sociolinguistics
specific_feature
word_list

only include works from the following macroareas
Middle America
South America
Australia
Papua
North America
Africa
Eurasia

only include works in language
eng
rus
cmn
spa
nld
fin
deu
ita
rus
por
tur
vie
include references where language is unclear no yes (recommended)

only include works newer than

You can draw a random sample here. Select size and classification to use and hit "submit". You will receive a list of languages relevant for your research. The sample is genetically and areally stratified. It can take some time (several minutes) to draw the sample, so please be patient.

On the left side, you can configure the parameters of your sample. Some of the parameters are obvious while others look more obscure. Sample size obviously refers to the quantity if languages you wish to include in your sample. The languages will come from a maximally diverse set of families. In order to determine such a set, we have to rely on a classification . The default "Glottolog 2012" is a sensible classification, but specialists might want to adopt a different classification. Document types refers to the types of documents your research will draw on. If you are interested in words, dictionaries will be more useful than grammars, whereas morphosyntacticians might have exactly opposite preferences. When determining the sample, language families without any documentation of the relevant sort will be excluded from the outset.

There are a number of minor parameters to tweak when drawing the sample. All of these have sensible defaults and can be left untouched unless you have very special requirements. Percentage of isolates is a measure intended to keep your sample from being flooded with one-member-families. This is by default set to 10%. Maximum number of members for isolates allows you to redefine isolates as to include small families as well. The default '1' does not treat small families as isolates, but by adjusting the value to '4' or '5', you can change this. You can force the sampling procedure to fail if there are not enough releveant documents by checking abort if insufficient number of documents

By default, the sampling procedure controls for area. You can deselect this if you prefer a sample unstratified for area. Highest node is useful if you want to draw a sample from a subtree of a language family, e.g. Indo-Iranian. In that case, you have to retrieve the id of the relevant node from the languoid description and enter it into the field. Only subnodes of that node will be considered then. Stratification level refers to a way to assure that large language families are not underrepresented. This is done by sampling on the basis of genera rather than phyla. This is only useful if you choose the classification "Dryer 2005". See Matthew Dryers works on sampling for the rationale of using genera. Special algorithms are needed when the sample size exceeds the number of language families. In these cases, some language families will be represented by more than one member. The default option "Random" will select the families with "extra representation" on random. From these families, additional subfamilies will be included. "Size" chooses the largest language families to be providers of additional languages. "Diversity Value" is a different technique, which takes into account the internal constitution of a family. The reader is referred to Bakker et al. (1993,1998) for further information about the diversity value method.

The sample drawing algorithm only includes works since 1800 since older works are normally few and far between and also often difficult to interpret. You can adjust the time frame of your research by changing this value to a time period more to your taste if you like.