- It will be easiest to combine the dictionaries into a
pandas.DataFrame
, and then update df
with additional details organizing the data.
- If the values in the
dictionaries
are of unequal length, as indicated in a comment, use Creating dataframe from a dictionary where entries have different lengths.
- Create a DataFrame for each
dict
as shown in the linked answer, and then use pd.concat
again to combine each DataFrame.
- Tested in
python 3.11.2
, pandas 2.0.0
, seaborn 0.12.2
import pandas as pd
import seaborn as sns
# update data in dictionaries from a comment
original_sequence = {'cat': [67, 17, 0], 'cheetah': [67, 17, 11], 'chlamydia': [67, 17, 27, 37, 17], 'polarbear': [67, 17, 27, 37, 32, 0]}
randomized_sequence = {'cat': [71, 61, 0], 'cheetah': [58, 56, 26], 'chlamydia': [47, 43, 44, 42, 29], 'polarbear': [52, 44, 54, 43, 42, 1]}
# list of dicts
list_of_dicts = [original_sequence, randomized_sequence]
# combine the dicts into dataframes, assign a new column to distinguish each sequence, reset the index and use it as the base pair amount
df = (pd.concat([pd.concat([pd.DataFrame(v, columns=[k]) for k, v in data.items()], axis=1)
.assign(Sequence=i) for i, data in enumerate(list_of_dicts)], ignore_index=False)
.reset_index()
.rename({'index': 'CG Amount'}, axis=1))
# Update the CG Amount column to correspond to the actual numbers
df['CG Amount'] = df['CG Amount'].add(1).mul(1000)
# seaborn works with DataFrames in a long form, so melt
df = df.melt(id_vars=['Sequence', 'CG Amount'], var_name='Organism', value_name='Repeats', col_wrap=2)
scatter
g = sns.relplot(data=df, x='CG Amount', y='Repeats', hue='Sequence', col='Organism')

bar
- If you're comparing two sequences at discrete intervals, a barplot seems the better option.
g = sns.catplot(data=df, kind='bar', x='CG Amount', y='Repeats', hue='Sequence', col='Organism', col_wrap=2)

df
before .melt
CG Amount cat cheetah chlamydia polarbear Sequence
0 1000 67.0 67.0 67.0 67 0
1 2000 17.0 17.0 17.0 17 0
2 3000 0.0 11.0 27.0 27 0
3 4000 NaN NaN 37.0 37 0
4 5000 NaN NaN 17.0 32 0
5 6000 NaN NaN NaN 0 0
6 1000 71.0 58.0 47.0 52 1
7 2000 61.0 56.0 43.0 44 1
8 3000 0.0 26.0 44.0 54 1
9 4000 NaN NaN 42.0 43 1
10 5000 NaN NaN 29.0 42 1
11 6000 NaN NaN NaN 1 1
df.head()
after .melt
Sequence CG Amount Organism Repeats
0 0 1000 cat 67.0
1 0 2000 cat 17.0
2 0 3000 cat 0.0
3 0 4000 cat NaN
4 0 5000 cat NaN
df.tail()
after .melt
Sequence CG Amount Organism Repeats
43 1 2000 polarbear 44.0
44 1 3000 polarbear 54.0
45 1 4000 polarbear 43.0
46 1 5000 polarbear 42.0
47 1 6000 polarbear 1.0
Notes
- If the values in dictionaries have the same length, use the following code to create
df
dict_1 = {'cat': [53, 69, 0], 'cheetah': [65, 52, 28]}
dict_2 = {'cat': [40, 39, 10], 'cheetah': [35, 62, 88]}
list_of_dicts = [dict_1, dict_2]
df = (pd.concat([pd.DataFrame(d, index=range(1000, 4000, 1000)).assign(Sequence=i) for i, d in enumerate(list_of_dicts)],
ignore_index=False)
.reset_index()
.rename({'index': 'CG Amount'}, axis=1))