enter image description hereI already wrote a function to simulate a random sequence, consisting of the four bases A, C, G, T, with the length of 10^1, 10^2, 10^3, 10^4 or 10^5. The probability for each base is 0,25. I wrote another function, which calculates the relative number of each base in a given sequence. Now I want to illustrate in a bar chart the relative number of each base (A, C, G, T) for each length (10^1, 10^2, 10^3, 10^4, 10^5) for a random sequence but I am not quite sure how to do it. My first thought is to write a pandas data frame, but I am a little bit confused about how to include my already written functions to it. Maybe you could help me.
Asked
Active
Viewed 56 times
1
-
Please include a small example of a sequence, and what exactly you'd like to chart. – Roy2012 Jun 20 '20 at 11:42
-
a sequence could be: "GTGCAGTGATTTCCTCGCAGTATTCATTTG". And I want to have a chart at the end which has 4 bars (relative number of each base) for 10^1, 4 bars (relative number of each base) for 10^2, 4 bars (relative number of each base) for 10^3, 4 bars (relative number of each base) for 10^4, 4 bars (relative number of each base) for 10^5. – AnjaPl Jun 20 '20 at 11:49
-
I don't understand what kind of chart you're looking for. Perhaps you can attach a fabricated example? (I don't understand the 10, 100, 1000, etc, part) – Roy2012 Jun 20 '20 at 11:55
-
You could maybe use the result from your function that calculates the relative number of each base in a given sequence and use [matplotlib to make a barchart](https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/barchart.html) – Wavy Jun 20 '20 at 12:08
-
Now I added a quick sketch of how I want the chart to look like in the end – AnjaPl Jun 20 '20 at 12:11
-
maybe this post maybe similar to your question. https://stackoverflow.com/questions/28931224/adding-value-labels-on-a-matplotlib-bar-chart – Christian Eslabon Jun 20 '20 at 13:04
1 Answers
0
If I understad correctly, you wanted to do something like this:
pd.concat([
pd.Series(
np.random.choice(list('ATCG'), 10**a),
name='10^{}'.format(a)).value_counts(normalize=True) for a in range(2,5)],
axis=1, sort=True).T.plot(kind='bar')
plt.ylabel('normalised counts')
plt.xlabel('sequence length')

warped
- 8,947
- 3
- 22
- 49