I am working with an excel file that contains a bunch of gene names and the times they occur per month for a group of years (if that makes sense). I currently have used pandas to read in the file and make a dataframe.
Input:
import pandas as pd
import plotly.express as px
df = pd.read_csv('genes.csv', sep = ',', header = None)
print(df)
Output:
0 1 2 3 ... 561 562 563 564
0 NaN 1971-1 1971-2 1971-3 ... 2017-9 2017-10 2017-11 2017-12
1 BRCA1 0 0 0 ... 0 0 0 0
2 BRCA2 0 0 0 ... 0 0 0 0
3 MAPK 0 0 0 ... 0 0 0 0
I know want to plot that data and have been trying to figure out how to set the dates as the index (not entirely sure if that's what I need to be doing). I saw a few different postings about using set_index, so I tried using the below code. It just gives me an error.
Input:
print(df.set_index([]).stack().reset_index(name='Date'))
fig = px.line(df, title = 'Human Gene Occurances Per Month')
fig.show()
Output:
ValueError: Must pass non-zero number of levels/codes
I am trying to use Plotly to create a graph for each of the genes that graphs the date on the x-axis and the count on the y-axis. Any help is greatly appreciated. Thank you
Also not all the counts equal zero, thats just want is shown in the condensed dataframe when printed.