1

enter image description hereI already wrote a function to simulate a random sequence, consisting of the four bases A, C, G, T, with the length of 10^1, 10^2, 10^3, 10^4 or 10^5. The probability for each base is 0,25. I wrote another function, which calculates the relative number of each base in a given sequence. Now I want to illustrate in a bar chart the relative number of each base (A, C, G, T) for each length (10^1, 10^2, 10^3, 10^4, 10^5) for a random sequence but I am not quite sure how to do it. My first thought is to write a pandas data frame, but I am a little bit confused about how to include my already written functions to it. Maybe you could help me.

AnjaPl
  • 11
  • 2
  • Please include a small example of a sequence, and what exactly you'd like to chart. – Roy2012 Jun 20 '20 at 11:42
  • a sequence could be: "GTGCAGTGATTTCCTCGCAGTATTCATTTG". And I want to have a chart at the end which has 4 bars (relative number of each base) for 10^1, 4 bars (relative number of each base) for 10^2, 4 bars (relative number of each base) for 10^3, 4 bars (relative number of each base) for 10^4, 4 bars (relative number of each base) for 10^5. – AnjaPl Jun 20 '20 at 11:49
  • I don't understand what kind of chart you're looking for. Perhaps you can attach a fabricated example? (I don't understand the 10, 100, 1000, etc, part) – Roy2012 Jun 20 '20 at 11:55
  • You could maybe use the result from your function that calculates the relative number of each base in a given sequence and use [matplotlib to make a barchart](https://matplotlib.org/3.2.1/gallery/lines_bars_and_markers/barchart.html) – Wavy Jun 20 '20 at 12:08
  • Now I added a quick sketch of how I want the chart to look like in the end – AnjaPl Jun 20 '20 at 12:11
  • maybe this post maybe similar to your question. https://stackoverflow.com/questions/28931224/adding-value-labels-on-a-matplotlib-bar-chart – Christian Eslabon Jun 20 '20 at 13:04

1 Answers1

0

If I understad correctly, you wanted to do something like this:

pd.concat([
    pd.Series(
        np.random.choice(list('ATCG'), 10**a), 
        name='10^{}'.format(a)).value_counts(normalize=True) for a in range(2,5)],
    axis=1, sort=True).T.plot(kind='bar')

plt.ylabel('normalised counts')
plt.xlabel('sequence length')

enter image description here

warped
  • 8,947
  • 3
  • 22
  • 49