0

Quick question: I am trying to do some analysis on the tickers in a CSV file.

Example of CSV file (Note that these are only the first two lines and there are around 200 tickers in total):

,ticker,dimension,calendardate,datekey,reportperiod,lastupdated,accoci,assets,assetsavg,assetsc,assetsnc,assetturnover,bvps,capex,cashneq,cashnequsd,cor,consolinc,currentratio,de,debt,debtc,debtnc,debtusd,deferredrev,depamor,deposits,divyield,dps,ebit,ebitda,ebitdamargin,ebitdausd,ebitusd,ebt,eps,epsdil,epsusd,equity,equityavg,equityusd,ev,evebit,evebitda,fcf,fcfps,fxusd,gp,grossmargin,intangibles,intexp,invcap,invcapavg,inventory,investments,investmentsc,investmentsnc,liabilities,liabilitiesc,liabilitiesnc,marketcap,ncf,ncfbus,ncfcommon,ncfdebt,ncfdiv,ncff,ncfi,ncfinv,ncfo,ncfx,netinc,netinccmn,netinccmnusd,netincdis,netincnci,netmargin,opex,opinc,payables,payoutratio,pb,pe,pe1,ppnenet,prefdivis,price,ps,ps1,receivables,retearn,revenue,revenueusd,rnd,roa,roe,roic,ros,sbcomp,sgna,sharefactor,sharesbas,shareswa,shareswadil,sps,tangibles,taxassets,taxexp,taxliabilities,tbvps,workingcapital,sector
4327,AAPL,ARQ,2007-06-30,2007-08-08,2007-06-30,2020-08-31,56000000.0,21647000000.0,,18745000000.0,2902000000.0,,0.552,-283000000.0,7118000000.0,7118000000.0,3415000000.0,818000000.0,2.681,0.615,0.0,0.0,0.0,0.0,0.0,81000000.0,0.0,0.0,0.0,1196000000.0,1277000000.0,0.23600000000000002,1277000000.0,1196000000.0,1196000000.0,0.034,0.033,0.034,13404000000.0,,13404000000.0,109420845098.0,24.0,22.848000000000003,944000000.0,0.039,1.0,1995000000.0,0.369,275000000.0,0.0,7262000000.0,,251000000.0,6649000000.0,6649000000.0,0.0,8243000000.0,6992000000.0,1251000000.0,116538845098.0,23000000.0,-6000000.0,118000000.0,0.0,0.0,229000000.0,-1433000000.0,-1170000000.0,1227000000.0,0.0,818000000.0,818000000.0,818000000.0,0.0,0.0,0.151,954000000.0,1041000000.0,3660000000.0,0.0,8.693999999999999,37.185,36.815,1626000000.0,0.0,4.7860000000000005,5.151,5.1339999999999995,1410000000.0,8199000000.0,5410000000.0,5410000000.0,208000000.0,,,,,65000000.0,746000000.0,1.0,24349946740.0,24270568000.0,24938788000.0,0.223,21372000000.0,687000000.0,378000000.0,0.0,0.8809999999999999,11753000000.0
4328,AAPL,ARQ,2007-09-30,2007-11-15,2007-09-29,2020-08-31,63000000.0,25347000000.0,,21956000000.0,3391000000.0,,0.596,-205000000.0,9352000000.0,9352000000.0,4127000000.0,904000000.0,2.3609999999999998,0.7440000000000001,0.0,0.0,0.0,0.0,0.0,93000000.0,0.0,0.0,0.0,1230000000.0,1323000000.0,0.213,1323000000.0,1230000000.0,1230000000.0,0.037000000000000005,0.036000000000000004,0.037000000000000005,14532000000.0,,14532000000.0,134496640397.0,27.0,25.258000000000003,1491000000.0,0.061,1.0,2090000000.0,0.336,337000000.0,0.0,6359000000.0,,346000000.0,6034000000.0,6034000000.0,0.0,10815000000.0,9299000000.0,1516000000.0,143848640397.0,2234000000.0,-29000000.0,68000000.0,0.0,0.0,142000000.0,396000000.0,615000000.0,1696000000.0,0.0,904000000.0,904000000.0,904000000.0,0.0,0.0,0.145,1030000000.0,1060000000.0,4970000000.0,0.0,9.899,41.147,40.748000000000005,1832000000.0,0.0,5.867999999999999,5.992000000000001,5.917000000000001,1637000000.0,9101000000.0,6217000000.0,6217000000.0,207000000.0,,,,,68000000.0,823000000.0,1.0,24515127672.0,24384640000.0,25084724000.0,0.255,25010000000.0,782000000.0,326000000.0,0.0,1.026,12657000000.0

I want to be able to for all the rows where the 'ticker' column = 'AAPL', write something to the 'sector' column. And then for all the columns where the 'ticker' column = 'GOOGL', write something to the 'sector' column. How would I do that?

I would show my code, but it really isn't helping much.

George Adams
  • 331
  • 4
  • 15
  • 1
    if you have some column based conditions(something like A and B can give new column G) consider using [`apply`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html#pandas-dataframe-apply) or [`iterrows`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html#pandas.DataFrame.iterrows) and then perform your conditionals on it. – k33da_the_bug Jan 17 '21 at 04:04
  • Does this answer your question? [Remap values in pandas column with a dict](https://stackoverflow.com/questions/20250771/remap-values-in-pandas-column-with-a-dict) – sushanth Jan 17 '21 at 04:38

2 Answers2

1

If you want to set some column value based on condition consider apply or iterrows

def set_vals(row):
  result = ''
  if row['ticker'] == 'AAPL':
    result = 'something1'
  elif row['ticker'] == 'GOOGL':
    result = 'something2'
  return result

df['sector'] = df.apply(set_vals,axis=1)
df

Edit:

If you don't want to use iteration logic and conditions are simple then you can also use

df.loc[df['ticker'] == 'AAPL', 'sector'] = 'something1' 
df.loc[df['ticker'] == 'GOOGL', 'sector'] = 'something2'

df
k33da_the_bug
  • 812
  • 8
  • 16
  • @Parfait yes sir, I have updated my answer accordingly. If you have any other alternative than this please let me know as it will help me too. I am also a newbie : ) – k33da_the_bug Jan 17 '21 at 05:11
1

Consider DataFrame.loc, numpy.where, or numpy.select for conditional logic assignment:

df.loc[df['ticker'] == 'AAPL', 'sector'] = 'myvalue'
df['Sector'] = np.where(df['ticker'].eq('AAPL'), 
                        'myvalue', 
                        df['sector'])
conds = [df['ticker'].eq('AAPL'), 
         df['ticker'].eq('GOOGL')
         ...]

vals = ['my value for AAPL',
        'my value for GOOGL',
        ...]

df['sector'] = np.select(conds, vals, default=df['sector'])
Parfait
  • 104,375
  • 17
  • 94
  • 125