I am looking at a yaml file for cassandra-stress:
# Keyspace name and create CQL
#
keyspace: stressexample
keyspace_definition: |
CREATE KEYSPACE stressexample WITH replication = {'class': 'NetworkTopologyStrategy', 'AWS_VPC_US_WEST_2': '2'};
#
# Table name and create CQL
#
table: eventsrawtest
table_definition: |
CREATE TABLE eventsrawtest (
host text,
bucket_time text,
service text,
time timestamp,
metric double,
state text,
PRIMARY KEY ((host, bucket_time, service), time)
) WITH CLUSTERING ORDER BY (time DESC)
#
# Meta information for generating data
#
columnspec:
- name: host
size: fixed(32) #In chars, no. of chars of UUID
population: uniform(1..600) # We have about 600 hosts with equal events per host
- name: bucket_time
size: fixed(18)
population: uniform(1..288) # 288 potential buckets
- name: service
size: uniform(10..100)
population: uniform(1000..2000) # 1000 - 2000 metrics per host
- name: time
cluster: fixed(15)
- name: state
size: fixed(4)
#
# Specs for insert queries
#
insert:
partitions: fixed(1) # 1 partition per batch
batchtype: UNLOGGED # use unlogged batches
select: fixed(10)/10 # no chance of skipping a row when generating inserts
#
# Read queries to run against the schema
#
queries:
pull-for-rollup:
cql: select * from eventsrawtest where host = ? and service = ? and bucket_time = ?
fields: samerow # pick selection values from same row in partition
get-a-value:
cql: select * from eventsrawtest where host = ? and service = ? and bucket_time = ? and time = ?
fields: samerow # pick selection values from same row in partition
I found this file on the internet and I don't quite understand how it works.
First of all, I don't understand columnspec. For partition columns host
, bucket_time
, service
, it says:
population: uniform(1..600) # We have about 600 hosts with equal events per host
population: uniform(1..288) # 288 potential buckets
population: uniform(1000..2000) # 1000 - 2000 metrics per host
Does that mean that I will have at most 600*288*2000 partitions? Is that the total number of partitions I will have when running cassandra-stress? Meaning that when the stress test is done, the maximum number of partitions I will see will be 600*288*2000? And the maximum number of columns I will see if I do "select count(*) from table" will be 600*288*2000*15?
Next I don't understand the insert part
partitions: fixed(1) # 1 partition per batch
Does this mean that only 1 partition will be updated with 1 insert operation?
select: fixed(10)/10 # no chance of skipping a row when generating inserts
What is this select? I don't understand how it works. At first my table is empty, how will it select and insert anything, if there's nothing in the table? Is my understanding correct that it picks 100% of data from each batch for insertion (since it says fixed(10)/10), and then inserts it?