A)
In supervised learning tasks, the overall optimization objective is the summed loss over all training examples and is defined as E = \sum_n loss(y_n, t_n), where n is an index over all training examples, y_n refers to the network output for training example n, t_n is the label of training example n and loss refers to the loss function. Note that y_n and t_n are in general vectorized quantities---the vector length is determined by the number of output neurons in the network.
One possible choice for the loss function is the squared error defined as loss(y, t) = \sum_k (y_k - t_k) ^ 2, where k refers to the number of output neurons in the network. In backpropagation, one has to compute the partial derivative of the overall optimization objective with respect to the network parameters---which are synaptic weights and neuron biases. This is achieved through the following formula according to the chain rule:
(\partial E / \partial w_{ij}) = (\partial E / \partial out_j) * (\partial out_j / \partial in_j) * (\partial in_j / partial w_{ij}),
where w_{ij} refers to the weight between neuron i and neuron j, out_j refers to the output of neuron j and in_j refers to the input to neuron j.
How to compute the neuron output out_j and its derivative with respect to the neuronal input in_j depends on which activation function is used. In case you use a liner activation function to compute a neuron's output out_j, the term (\partial out_j / \partial in_j) becomes 1. In case you use for example the logistic function as activation function, the term (\partial out_j / \partial in_j) becomes sig(in_j) * (1 - sig(in_j)), where sig is the logistic function.
B)
In resilient backpropagation, biases are updated exactly the same way as weights---based on the sign of partial derivatives and individual adjustable step sizes.
C)
I am not quite sure if I understand correctly. The overall optimization objective is a scalar function of all network parameters, no matter how many output neurons there are. So there should be no confusion regarding how to compute partial derivatives here.
In general, in order to compute the partial derivative (\partial E / \partial w_{ij}) of the overall optimization objective E with respect to some weight w_{ij}, one has to compute the partial derivative (\partial out_k / \partial w_{ij}) of each output neuron k with respect to w_{ij} as
(\partial E / \partial w_{ij}) = \sum_k (\partial E / \partial out_k) * (\partial out_k / \partial w_{ij}).
Note however that the partial derivative (\partial out_k / \partial w_{ij}) of the output neuron k with respect to w_{ij} will be zero if w_{ij} does not impact the output out_k of output neuron k.
One more thing. In case one uses the squared error as loss function, the partial derivative (\partial E / \partial out_k) of the overall optimization objective E with respect to the output out_k of some output neuron k is
(\partial E / \partial out_k) = \sum_k 2 * (out_k - t_k),
where the quantity (out_k - t_k) is referred to as error attached to output unit k and where I assumed only one single training example with label t for notational convenience. Note that if w_{ij} does not have any impact on the output out_k of output neuron k, then the update of w_{ij} will not depend on the error (out_k - t_k) because (\partial out_k / \partial w_{ij}) = 0 as mentioned above.
A final remark to avoid any confusion. y_k and out_k refer both to the output of output neuron k in the network.