1

I am working with an excel file that contains a bunch of gene names and the times they occur per month for a group of years (if that makes sense). I currently have used pandas to read in the file and make a dataframe.

Input:

import pandas as pd
import plotly.express as px

df = pd.read_csv('genes.csv', sep = ',', header = None)
print(df)

Output:

     0       1       2       3    ...     561      562      563      564
0    NaN  1971-1  1971-2  1971-3  ...  2017-9  2017-10  2017-11  2017-12
1  BRCA1       0       0       0  ...       0        0        0        0
2  BRCA2       0       0       0  ...       0        0        0        0
3   MAPK       0       0       0  ...       0        0        0        0

I know want to plot that data and have been trying to figure out how to set the dates as the index (not entirely sure if that's what I need to be doing). I saw a few different postings about using set_index, so I tried using the below code. It just gives me an error.

Input:

print(df.set_index([]).stack().reset_index(name='Date'))
fig = px.line(df, title = 'Human Gene Occurances Per Month')
fig.show()

Output:

ValueError: Must pass non-zero number of levels/codes

I am trying to use Plotly to create a graph for each of the genes that graphs the date on the x-axis and the count on the y-axis. Any help is greatly appreciated. Thank you

Also not all the counts equal zero, thats just want is shown in the condensed dataframe when printed.

  • 1
    Check out minimal verifiable example. So the best way to get help would be to construct an example dataframe in your code. – roadrunner66 Oct 15 '20 at 01:55
  • 1
    I edited my question. Is this more helpful? Not quite sure how to add an example df. Reading through the MRE post currently. – Lauren Kirsch Oct 15 '20 at 02:14
  • See my "answer" for an example on how to make a quick dataframe example from a dictionary. This is straight from the pandas documentation on the dataframe, and is only one of the ways to construct a dataframe. – roadrunner66 Oct 15 '20 at 02:24
  • 1
    @LaurenKirsch If you read [this](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict) you'll learn how to share a sample of your dataframe in a few minutes. – vestland Oct 15 '20 at 05:18

2 Answers2

2
import numpy as np 
import pandas as pd
import matplotlib.pyplot as p
#     0       1       2       3    ...     561      562      563      564
# 0    NaN  1971-1  1971-2  1971-3  ...  2017-9  2017-10  2017-11  2017-12
# 1  BRCA1       0       0       0  ...       0        0        0        0
# 2  BRCA2       0       0       0  ...       0        0        0        0
# 3   MAPK       0       0       0  ...       0        0        0        0

d={'0':['NaN','BRCA1','BRCA2'],'1':['1971-1',0,0],'2':['1971-2',1,0],'3':['1971-3',0,1]}
df =pd.DataFrame(data=d)
df=df.transpose()    # time series are typically in columns
df

enter image description here

#turn that column into actual dates, that pandas recognizes as such

df[0] = df[0].astype('datetime64[ns]')   
df

enter image description here

 # you probably mean the first row to be column headers

df.columns = df.iloc[0]             # set columns to first row
df.drop(df.index[0],inplace=True)   # drop that row

df

enter image description here

# set the first column to have the title "Date"

df.rename(columns={df.columns[0]: "Date"},inplace=True)
df

enter image description here

p.figure(figsize=(12,3),dpi=100)
p.plot(df.iloc[:,0],df.iloc[:,1], label= df.columns[1])
p.plot(df.iloc[:,0],df.iloc[:,2] ,label= df.columns[2])
p.legend()

enter image description here

Pandas has more ways to solve problems than you can throw a stick at. Unless you work 8 hours a day with it, you will forget. I'm managing it by keeping bits that work complete with examples in a personal wiki, so I can search for it faster, when I forgot something.

roadrunner66
  • 7,772
  • 4
  • 32
  • 38
2

In general:

df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace=True)
df.set_index(<column name>, inplace=True)

In your example;

# transpose dataframe first
df=df.T
df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace=True)
df.rename(columns={'nan':'Time'}, inplace=True)
df.set_index('Time', inplace=True)

Your dataframe:

        BRCA1 BRCA2 MAPK
Time                    
1971-1      0     0    0
1971-2      0     0    0
1971-3      0     0    0
2017-9      0     0    0
2017-10     0     0    0
2017-11     0     0    0
2017-12     0     0    0

Your plot

This is made using the most easiest possible approach, with pandas plotting backed set to plotly. The reason it looks a bit weird is the limited dataset you've provided. I've only added some dummy data in there to make it possible to discern the different traces. Go ahead and try with you real world data and I'm pretty sure it will look perfect.

enter image description here

Complete code:

import pandas as pd
pd.options.plotting.backend = "plotly"

# data
df=pd.DataFrame({'0': {0: 'nan', 1: 'BRCA1', 2: 'BRCA2', 3: 'MAPK'},
                 '1': {0: '1971-1', 1: '0', 2: '0', 3: '0'},
                 '2': {0: '1971-2', 1: '0', 2: '0', 3: '0'},
                 '3': {0: '1971-3', 1: '1', 2: '0', 3: '0'},
                 '561': {0: '2017-9', 1: '1', 2: '2', 3: '0'},
                 '562': {0: '2017-10', 1: '1', 2: '2', 3: '0'},
                 '563': {0: '2017-11', 1: '1', 2: '2', 3: '3'},
                 '564': {0: '2017-12', 1: '1', 2: '2', 3: '3'}})

df=df.T
df.rename(columns=df.iloc[0], inplace = True)
df.drop(df.index[0], inplace=True)
df.rename(columns={'nan':'Time'}, inplace=True)
df.set_index('Time', inplace=True)
df.plot(template='plotly_dark')
vestland
  • 55,229
  • 37
  • 187
  • 305